DISTRIBUTED STATISTICAL INFERENCE FOR MASSIVE DATA
成果类型:
Article
署名作者:
Chen, Song Xi; Peng, Liuhua
署名单位:
Peking University; Peking University; University of Melbourne
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/21-AOS2062
发表日期:
2021
页码:
2851-2869
关键词:
edgeworth expansions
BOOTSTRAP METHODS
regression
摘要:
This paper considers distributed statistical inference for general symmetric statistics in the context of massive data with efficient computation. Estimation efficiency and asymptotic distributions of the distributed statistics are provided, which reveal different results between the nondegenerate and degenerate cases, and show the number of the data subsets plays an important role. Two distributed bootstrap methods are proposed and analyzed to approximation the underlying distribution of the distributed statistics with improved computation efficiency over existing methods. The accuracy of the distributional approximation by the bootstrap are studied theoretically. One of the methods, the pseudo-distributed bootstrap, is particularly attractive if the number of datasets is large as it directly resamples the subset-based statistics, assumes less stringent conditions and its performance can be improved by studentization.