A REWEIGHTED RANDOM FOREST TO PREDICT HEALTH OUTCOMES USING HUMAN MICROBIOME DATA

成果类型:
Article
署名作者:
Wang, Tian; Li, Bing; Xu, Huang; Miao, Yuqi; Qian, Min; Wang, Shuang
署名单位:
Columbia University; Brown University
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS1973
发表日期:
2025
页码:
926-942
关键词:
gut microbiota regression CLASSIFICATION selection unifrac
摘要:
Many statistical methods examining associations and predictions of microbiome on health outcomes are distance-based, where several distance metrics are calculated to capture different aspects of microbiome and the optimal one is selected for final association or prediction. Studies have suggested that diverse forms of taxa are linked to health outcomes; that is, both abundant taxa in close proximity and rare taxa far away on the phylogenetic tree could be associated with or predictive of the same outcome. However, existing prediction methods often utilize only one association form. Among popular prediction models, random forest (RF) has been widely applied and shown satisfactory performance. Here we introduce the reweighted-RF estimate that reweights sample contributions based on their microbiome similarities, and develop multiple reweighted-RF estimates by adopting different distance metrics to encompass various microbiome-outcome association forms. These reweighted-RF estimates are then ensembled for final predictions using multiple signal types. In simulation studies we demonstrated improved prediction performance of the reweighted-RF estimate, with weights from the distance metric capturing the true microbiome signal form, over that of the original RF. The ensemble estimate also consistently outperforms the original RF or performs as well as the best reweighted-RF estimate when there exist multiple signal forms, where ensemble weights provide insights into the contributing signal forms. We applied our method to predict the binary obesity and irritable bowel syndrome and continuous body mass index and age, using both gut and oral microbiome data from the American Gut Project, and observed improved prediction results over those of competing methods.
来源URL: