Renewable estimation and incremental inference in generalized linear models with streaming data sets

成果类型:
Article
署名作者:
Luo, Lan; Song, Peter X-K
署名单位:
University of Michigan System; University of Michigan
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/rssb.12352
发表日期:
2020
页码:
69-97
关键词:
摘要:
The paper presents an incremental updating algorithm to analyse streaming data sets using generalized linear models. The method proposed is formulated within a new framework of renewable estimation and incremental inference, in which the maximum likelihood estimator is renewed with current data and summary statistics of historical data. Our framework can be implemented within a popular distributed computing environment, known as Apache Spark, to scale up computation. Consisting of two data-processing layers, the rho architecture enables us to accommodate inference-related statistics and to facilitate sequential updating of the statistics used in both estimation and inference. We establish estimation consistency and asymptotic normality of the proposed renewable estimator, in which the Wald test is utilized for an incremental inference. Our methods are examined and illustrated by various numerical examples from both simulation experiments and a real world data analysis.