Integrative Analysis of Microbial 16S Gene and Shotgun Metagenomic Sequencing Data Improves Statistical Efficiency in Testing Differential Abundance

成果类型:
Article; Early Access
署名作者:
Yue, Ye; Mao, Yicong; Read, Timothy D.; Fedirko, Veronika; Satten, Glen A.; Chen, Xuan; Zhan, Xiang; Hu, Yi-Juan
署名单位:
Emory University; Peking University; Emory University; University of Texas System; UTMD Anderson Cancer Center; Emory University; Emory University; Huazhong Agricultural University; Southeast University - China; Peking University; Peking University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2025.2516205
发表日期:
2025
关键词:
logistic-regression parameters gut microbiome ribosomal-rna identification omics
摘要:
The most widely used technologies for profiling microbial communities are 16S marker-gene sequencing and shotgun metagenomic sequencing. Surprisingly, many microbiome studies have performed both experiments on the same cohort of samples. The two sequencing datasets often reveal consistent patterns of microbial signatures, suggesting that an integrative analysis of both datasets could enhance the testing power for these signatures. However, differential experimental biases, partially overlapping samples, and uneven library sizes pose tremendous challenges when combining the two datasets. In this article, we introduce the first method of this kind, named Com-2seq, that combines the two datasets for testing differential abundance at the genus level as well as the community level while overcoming these difficulties. Our simulation studies demonstrate that Com-2seq substantially enhances statistical efficiency over analysis of a single dataset and outperforms two ad hoc approaches to integrative analysis. In analysis of real microbiome data, Com-2seq uncovered scientifically plausible findings, namely, the association of Butyrivibrio, Gemella and Ignavigranum with prediabetes status, which would have been missed by analyzing a single dataset. Butyrivibrio failed to reach the significance level in the analysis of each dataset despite showing a consistent trend; Gemella and Ignavigranum failed to produce adequate data in the 16S experiment. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
来源URL: