Statistical Inference for Maximin Effects: Identifying Stable Associations across Multiple Studies
成果类型:
Article
署名作者:
Guo, Zijian
署名单位:
Rutgers University System; Rutgers University New Brunswick
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2023.2233162
发表日期:
2024
页码:
1968-1984
关键词:
likelihood ratio tests
confidence-intervals
Covariate Shift
parameter
bootstrap
selection
database
摘要:
Integrative analysis of data from multiple sources is critical to making generalizable discoveries. Associations consistently observed across multiple source populations are more likely to be generalized to target populations with possible distributional shifts. In this article, we model the heterogeneous multi-source data with multiple high-dimensional regressions and make inferences for the maximin effect (Meinshausen and B & uuml;hlmann, AoS, 43(4), 1801-1830). The maximin effect provides a measure of stable associations across multi-source data. A significant maximin effect indicates that a variable has commonly shared effects across multiple source populations, and these shared effects may be generalized to a broader set of target populations. There are challenges associated with inferring maximin effects because its point estimator can have a nonstandard limiting distribution. We devise a novel sampling method to construct valid confidence intervals for maximin effects. The proposed confidence interval attains a parametric length. This sampling procedure and the related theoretical analysis are of independent interest for solving other nonstandard inference problems. Using genetic data on yeast growth in multiple environments, we demonstrate that the genetic variants with significant maximin effects have generalizable effects under new environments. The proposed method is implemented in the R package MaximinInfer available from CRAN. Supplementary materials for this article are available online.