您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematical Programming > 2025 > 1-2期

Neural spectrahedra and semidefinite lifts: global convex optimization of degree-two polynomial activation neural networks in polynomial-time

成果类型：

Article

署名作者：

Bartan, Burak; Pilanci, Mert

署名单位：

Stanford University

刊物名称：

MATHEMATICAL PROGRAMMING

ISSN/ISSBN：

0025-5610

DOI：

10.1007/s10107-024-02153-5

发表日期：

2025

页码：

737-769

关键词：

cut approximation algorithms

摘要：

The training of two-layer neural networks with nonlinear activation functions is an important non-convex optimization problem with numerous applications and promising performance in layerwise deep learning. In this paper, we develop exact convex optimization formulations for two-layer neural networks with second degree polynomial activations based on dual relaxations and semidefinite programming. Remarkably, we show that our semidefinite relaxations are always tight. Therefore, the computational complexity for global optimization is polynomial in the input dimension and sample size for all input data. The developed convex formulations are proven to achieve the same globally optimal solution set as their non-convex counterparts. Specifically, globally optimal two-layer neural networks with degree-two polynomial activations can be found by solving a semidefinite program (SDP) and decomposing the solution using a procedure we call Neural Decomposition. Moreover, the choice of regularizers plays a crucial role in the computational tractability of neural network training. We show that the standard weight decay regularization formulation is NP-hard, whereas other simple convex penalties render the problem tractable in polynomial time via convex programming. The techniques go beyond the fully connected architecture to encompass various neural network structures including those with vector outputs and convolutional architectures. We provide extensive numerical simulations showing that the standard backpropagation approach often fails to achieve the global optimum of the training loss. The proposed approach is significantly faster to obtain better test accuracy compared to the standard backpropagation procedure.

来源URL：

访问原文