A scalable estimate of the out-of-sample prediction error via approximate leave-one-out cross-validation

成果类型:
Article
署名作者:
Rad, Kamiar Rahnama; Maleki, Arian
署名单位:
City University of New York (CUNY) System; Columbia University
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/rssb.12374
发表日期:
2020
页码:
965-996
关键词:
robust regression variable selection randomized gcv MODEL Lasso regularization inference freedom CHOICE map
摘要:
The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such asK-fold cross-validation suffer from large biases. Motivated by the low bias of the leave-one-out cross-validation method, we propose a computationally efficient closed form approximate leave-one-out formula ALO for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires a minor computational overhead. With minor assumptions about the data-generating process, we obtain a finite sample upper bound for the difference between leave-one-out cross-validation and approximate leave-one-out cross-validation, |LO-ALO|. Our theoretical analysis illustrates that |LO-ALO|-> 0 with overwhelming probability, whenn,p ->infinity, where the dimensionpof the feature vectors may be comparable with or even greater than the number of observations,n. Despite the high dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that |LO-ALO| decreases asnandpincrease, revealing the excellent finite sample performance of approximate leave-one-out cross-validation. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat.
来源URL: