您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Statistics > 2018 > 6期

DEBIASING THE LASSO: OPTIMAL SAMPLE SIZE FOR GAUSSIAN DESIGNS

成果类型：

Article

署名作者：

Javanmard, Adel; Montanari, Andrea

署名单位：

University of Southern California; Stanford University; Stanford University

刊物名称：

ANNALS OF STATISTICS

ISSN/ISSBN：

0090-5364

DOI：

10.1214/17-AOS1630

发表日期：

2018

页码：

2593-2622

关键词：

high-dimensional inference confidence-intervals variable selection phase-transitions DANTZIG SELECTOR regression prediction sparsity RECOVERY regions

摘要：

Performing statistical inference in high-dimensional models is challenging because of the lack of precise information on the distribution of high-dimensional regularized estimators. Here, we consider linear regression in the high-dimensional regime p >> n and the Lasso estimator: we would like to perform inference on the parameter vector theta*is an element of R-p. Important progress has been achieved in computing confidence intervals and p-values for single coordinates. theta(i)*, i is an element of{1,..., p}. A key role in these new inferential methods is played by a certain debiased estimator (theta) over cap (d). Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of (theta) over cap (d) are asymptotically Gaussian provided the true parameters vector theta* is s(0)-sparse with s(0) = o(root n/log p). The condition s(0) = o(root n/log p) is considerably stronger than the one for consistent estimation, namely s(0) = o(n/log p). In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition s(0) = o(n/(log p)(2)). The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well. For intermediate regimes, we describe the trade-off between sparsity in the coefficients theta*, and sparsity in the inverse covariance of the design. We further discuss several applications of our results beyond high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor 1 + o(n)(1) for i.i.d. Gaussian designs.