您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2025 > 1期

Stationary Points of a Shallow Neural Network with Quadratic Activations and the Global Optimality of the Gradient Descent Algorithm

成果类型：

Article

署名作者：

Gamarnik, David; Kizildag, Eren C.; Zadik, Ilias

署名单位：

Massachusetts Institute of Technology (MIT); Columbia University; Yale University

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2021.0082

发表日期：

2025

关键词：

recovery approximation EQUATIONS bounds

摘要：

We consider the problem of training a shallow neural network with quadratic activation functions and the generalization power of such trained networks. Assuming that the samples are generated by a full rank matrix W* of the hidden network node weights, we obtain the following results. We establish that all full -rank approximately stationary solutions of the risk minimization problem are also approximate global optimums of the risk (in -sample and population). As a consequence, we establish that, when trained on polynomially many samples, the gradient descent algorithm converges to the global optimum of the risk minimization problem regardless of the width of the network when it is initialized at some value nu*, which we compute. Furthermore, the network produced by the gradient descent has a near zero generalization error. Next, we establish that initializing the gradient descent algorithm below nu* is easily achieved when the weights of the ground truth matrix W* are randomly generated and the matrix is sufficiently overparameterized. Finally, we identify a simple necessary and sufficient geometric condition on the size of the training set under which any global minimizer of the empirical risk has necessarily zero generalization error.

来源URL：

访问原文