PREDICTION WHEN FITTING SIMPLE MODELS TO HIGH-DIMENSIONAL DATA

成果类型:
Article
署名作者:
Steinberger, Lukas; Leeb, Hannes
署名单位:
University of Freiburg; University of Vienna
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/18-AOS1719
发表日期:
2019
页码:
1408-1442
关键词:
selection inference distributions projections
摘要:
We study linear subset regression in the context of a high-dimensional linear model. Consider y = v + theta' z + epsilon with univariate response y and a d-vector of random regressors z, and a submodel where y is regressed on a set of p explanatory variables that are given by x = M' z, for some d x p matrix M. Here, high-dimensional means that the number d of available explanatory variables in the overall model is much larger than the number p of variables in the submodel. In this paper, we present Pinsker-type results for prediction of y given x. In particular, we show that the mean squared prediction error of the best linear predictor of y given x is close to the mean squared prediction error of the corresponding Bayes predictor E[y parallel to x], provided only that p / log d is small. We also show that the mean squared prediction error of the (feasible) least-squares predictor computed from n independent observations of (y, x) is close to that of the Bayes predictor, provided only that both p / log d and p / n are small. Our results hold uniformly in the regression parameters and over large collections of distributions for the design variables z.