SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION. ESTIMATING SQUARED ERROR AND A PHASE TRANSITION
成果类型:
Article
署名作者:
Gamarnik, David; Zadik, Ilias
署名单位:
Massachusetts Institute of Technology (MIT)
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/21-AOS2130
发表日期:
2022
页码:
880-903
关键词:
information-theoretic limits
superposition codes
local algorithms
signal recovery
variable selection
support recovery
gene-expression
model selection
dense
sets
摘要:
We consider a sparse high-dimensional regression model where the goal is to recover a k-sparse unknown binary vector beta* from n noisy linear observations of the form Y = X beta* + W is an element of R-n where X is an element of R-n has i.i.d. N(0, 1) entries and W is an element of R-n has i.i.d. N(0, sigma(2)) entries. In the high signal-to-noise ratio regime and sublinear sparsity regime, while the order of the sample size needed to recover the unknown vector information-theoretically is known to be n* := 2k log p/log (k/sigma(2) + 1), no polynomial-time algorithm is known to succeed unless n > n(alg) := (2k + sigma(2)) log p. In this work, we offer a series of results investigating multiple computational and statistical aspects of the recovery task in the regime n is an element of [n *, n(alg)]. First, we establish a novel information-theoretic property of the MLE of the problem happening around n = n* samples, which we coin as an all-ornothing behavior: when n > n* it recovers almost perfectly the support of beta*, while if n < n* it fails to recover any fraction of it correctly. Second, at an attempt to understand the computational hardness in the regime n is an element of r we prove that at order n(al)g samples there is an Overlap Gap is an element of [n*, n(alg)], Property (OGP) phase transition occurring at the landscape of the MLE: for constants c, C > 0 when n < cn(alg) OGP appears in the landscape of MLE while if n > Cn(alg) OGP disappears. OGP is a geometric disconnectivity property, which initially appeared in the theory of spin glasses and is known to suggest algorithmic hardness when it occurs. Finally, using certain technical results obtained to establish the OGP phase transition, we additionally establish various novel positive and negative algorithmic results for the recovery task of interest, including the failure of LASSO with access to n < cn(alg) samples and the success of a simple local search method with access to n > Cn(alg) samples.