DEBIASING THE LASSO: OPTIMAL SAMPLE SIZE FOR GAUSSIAN DESIGNS

成果类型:
Article
署名作者:
Javanmard, Adel; Montanari, Andrea
署名单位:
University of Southern California; Stanford University; Stanford University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/17-AOS1630
发表日期:
2018
页码:
2593-2622
关键词:
high-dimensional inference confidence-intervals variable selection phase-transitions DANTZIG SELECTOR regression prediction sparsity RECOVERY regions
摘要:
Performing statistical inference in high-dimensional models is challenging because of the lack of precise information on the distribution of high-dimensional regularized estimators. Here, we consider linear regression in the high-dimensional regime p >> n and the Lasso estimator: we would like to perform inference on the parameter vector theta*is an element of R-p. Important progress has been achieved in computing confidence intervals and p-values for single coordinates. theta(i)*, i is an element of{1,..., p}. A key role in these new inferential methods is played by a certain debiased estimator (theta) over cap (d). Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of (theta) over cap (d) are asymptotically Gaussian provided the true parameters vector theta* is s(0)-sparse with s(0) = o(root n/log p). The condition s(0) = o(root n/log p) is considerably stronger than the one for consistent estimation, namely s(0) = o(n/log p). In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition s(0) = o(n/(log p)(2)). The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well. For intermediate regimes, we describe the trade-off between sparsity in the coefficients theta*, and sparsity in the inverse covariance of the design. We further discuss several applications of our results beyond high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor 1 + o(n)(1) for i.i.d. Gaussian designs.