BOOTSTRAPPING AND SAMPLE SPLITTING FOR HIGH-DIMENSIONAL, ASSUMPTION-LEAN INFERENCE
成果类型:
Article
署名作者:
Rinaldo, Alessandro; Wasserman, Larry; G'Sell, Max
署名单位:
Carnegie Mellon University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/18-AOS1784
发表日期:
2019
页码:
3438-3469
关键词:
Post-selection Inference
confidence-intervals
model-selection
nonlinear statistics
normal approximation
P-values
regression
CONVERGENCE
regions
bounds
摘要:
Several new methods have been recently proposed for performing valid inference after model selection. An older method is sample splitting: use part of the data for model selection and the rest for inference. In this paper, we revisit sample splitting combined with the bootstrap (or the Normal approximation). We show that this leads to a simple, assumption-lean approach to inference and we establish results on the accuracy of the method. In fact, we find new bounds on the accuracy of the bootstrap and the Normal approximation for general nonlinear parameters with increasing dimension which we then use to assess the accuracy of regression inference. We define new parameters that measure variable importance and that can be inferred with greater accuracy than the usual regression coefficients. Finally, we elucidate an inference-prediction trade-off: splitting increases the accuracy and robustness of inference but can decrease the accuracy of the predictions.