Data Thinning for Poisson Factor Models and its Applications

成果类型:
Article; Early Access
署名作者:
Wang, Zhijing; Xu, Peirong; Zhao, Hongyu; Wang, Tao
署名单位:
Shanghai Jiao Tong University; Yale University; Shanghai Jiao Tong University; Shanghai Jiao Tong University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2025.2546577
发表日期:
2025
关键词:
NUMBER components inference
摘要:
The Poisson factor model is a powerful tool for dimension reduction and visualization of large-scale count datasets across diverse domains. Despite the availability of efficient algorithms for estimating factors and loadings, existing methods either require prior knowledge of the number of factors, or resort to ad hoc criteria for its determination. This article proposes a novel data-driven criterion called Information Criterion via Data Thinning (ICDT), leveraging the thinning property of the Poisson distribution. Unlike traditional data splitting, data thinning partitions the count matrix into training and validation sets while preserving both the distribution and the underlying data structure. Interestingly, the validation error can be decomposed into the training error plus a covariance penalty. A simple estimator of the covariance penalty is obtained, leading to the development of ICDT. The selection consistency of ICDT is derived when both the sample size and the number of variables diverge to infinity. The proposed methodology is extended to dimension reduction in regression by incorporating the response inversely into the Poisson factor model. Extensive simulated examples and two real data applications are used to evaluate the performance of ICDT and compare it with existing criteria. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
来源URL: