Semisupervised inference for explained variance in high dimensional linear regression and its applications
成果类型:
Article
署名作者:
Cai, T. Tony; Guo, Zijian
署名单位:
University of Pennsylvania; Rutgers University System; Rutgers University New Brunswick
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/rssb.12357
发表日期:
2020
页码:
391-419
关键词:
optimal adaptive estimation
confidence-intervals
DANTZIG SELECTOR
Lasso
摘要:
The paper considers statistical inference for the explained variance beta T sigma beta under the high dimensional linear model Y=X beta+epsilon in the semisupervised setting, where beta is the regression vector and sigma is the design covariance matrix. A calibrated estimator, which efficiently integrates both labelled and unlabelled data, is proposed. It is shown that the estimator achieves the minimax optimal rate of convergence in the general semisupervised framework. The optimality result characterizes how the unlabelled data contribute to the estimation accuracy. Moreover, the limiting distribution for the proposed estimator is established and the unlabelled data have also proved useful in reducing the length of the confidence interval for the explained variance. The method proposed is extended to semisupervised inference for the unweighted quadratic functional ||beta||22. The inference results obtained are then applied to a range of high dimensional statistical problems, including signal detection and global testing, prediction accuracy evaluation and confidence ball construction. The numerical improvement of incorporating the unlabelled data is demonstrated through simulation studies and an analysis of estimating heritability for a yeast segregant data set with multiple traits.
来源URL: