Scalable subsampling: computation, aggregation and inference
成果类型:
Article
署名作者:
Politis, Dimitris
署名单位:
University of California System; University of California San Diego
刊物名称:
BIOMETRIKA
ISSN/ISSBN:
0006-3444
DOI:
10.1093/biomet/asad021
发表日期:
2024
页码:
347354
关键词:
selection
摘要:
Subsampling has seen a resurgence in the big data era where the standard, full-resample size bootstrap can be infeasible to compute. Nevertheless, even choosing a single random subsample of size b can be computationally challenging with both b and the sample size n being very large. This paper shows how a set of appropriately chosen, nonrandom subsamples can be used to conduct effective, and computationally feasible, subsampling distribution estimation. Furthermore, the same set of subsamples can be used to yield a procedure for subsampling aggregation, also known as subagging, that is scalable with big data. Interestingly, the scalable subagging estimator can be tuned to have the same, or better, rate of convergence than that of theta<^>n. Statistical inference could then be based on the scalable subagging estimator instead of the original theta<^>n.
来源URL: