您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 综合性期刊 > Proceedings of the National Academy of Sciences of the United States of America > 2024 > 27期

Explaining neural scaling laws

成果类型：

Article

署名作者：

Bahri, Yasaman; Dyer, Ethan; Kaplan, Jared; Lee, Jaehoon; Sharma, Utkarsh

署名单位：

Alphabet Inc.; DeepMind; Johns Hopkins University

刊物名称：

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

ISSN/ISSBN：

0027-9769

DOI：

10.1073/pnas.2311878121

发表日期：

2024-07-02

关键词：

integral-operators eigenvalues

摘要：

The population loss of trained deep neural networks often follows precise power -law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance -limited and resolution -limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance -limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution -limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution -limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents under modifications of task and architecture aspect ratio. Our work provides a taxonomy for classifying different scaling regimes, underscores that there can be different mechanisms driving improvements in loss, and lends insight into the microscopic origin and relationships between scaling exponents.