Testing a Large Number of Composite Null Hypotheses Using Conditionally Symmetric Multidimensional Gaussian Mixtures in Genome-Wide Studies
成果类型:
Article
署名作者:
Sun, Ryan; Mccaw, Zachary R.; Lin, Xihong
署名单位:
University of Texas System; UTMD Anderson Cancer Center; Harvard University; Harvard T.H. Chan School of Public Health; Harvard University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2024.2422124
发表日期:
2025
页码:
605-617
关键词:
false discovery rate
EMPIRICAL BAYES
lung-cancer
multiple
expression
metaanalysis
摘要:
Causal mediation, pleiotropy, and replication analyses are three highly popular genetic study designs. Although these analyses address different scientific questions, the underlying statistical inference problems all involve large-scale testing of composite null hypotheses. The goal is to determine whether all null hypotheses-as opposed to at least one-in a set of individual tests should simultaneously be rejected. Recently, various methods have been proposed for each of these situations, including an appealing two-group empirical Bayes approach that calculates local false discovery rates (lfdr). However, lfdr estimation is difficult due to the need for multivariate density estimation. Furthermore, the multiple testing rules for the empirical Bayes lfdr approach can disagree with conventional frequentist z-statistics, which is troubling for a field that ubiquitously uses summary statistics. This work proposes a framework to unify two-group testing in genetic association composite null settings, the conditionally symmetric multidimensional Gaussian mixture model (csmGmm). The csmGmm is shown to demonstrate more robust operating characteristics than recently-proposed alternatives. Crucially, the csmGmm also offers interpretability guarantees by harmonizing lfdr and z-statistic testing rules. We extend the base csmGmm to cover each of the mediation, pleiotropy, and replication settings, and we prove that the lfdr z-statistic agreement holds in each situation. We apply the model to a collection of translational lung cancer genetic association studies that motivated this work. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.