Generalized Data Thinning Using Sufficient Statistics

成果类型:
Article
署名作者:
Dharamshi, Ameer; Neufeld, Anna; Motwani, Keshav; Gao, Lucy L.; Witten, Daniela; Bien, Jacob
署名单位:
University of Washington; University of Washington Seattle; Fred Hutchinson Cancer Center; University of British Columbia; University of Washington; University of Washington Seattle; University of Washington; University of Washington Seattle; University of Southern California
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2024.2353948
发表日期:
2025
页码:
511-523
关键词:
inference changepoint selection
摘要:
Our goal is to develop a general strategy to decompose a random variable X into multiple independent random variables, without sacrificing any information about unknown parameters. A recent paper showed that for some well-known natural exponential families, X can be thinned into independent random variables X-(1),& mldr;,X-(K) , such that X=& sum;X-K(k=1)(k) . These independent random variables can then be used for various model validation and inference tasks, including in contexts where traditional sample splitting fails. In this article, we generalize their procedure by relaxing this summation requirement and simply asking that some known function of the independent random variables exactly reconstruct X. This generalization of the procedure serves two purposes. First, it greatly expands the families of distributions for which thinning can be performed. Second, it unifies sample splitting and data thinning, which on the surface seem to be very different, as applications of the same principle. This shared principle is sufficiency. We use this insight to perform generalized thinning operations for a diverse set of families. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
来源URL: