CLUSTER ANALYSIS OF LONGITUDINAL PROFILES FOR COMPOSITIONAL COUNT DATA
成果类型:
Article
署名作者:
Duan, Chenyang; Jiang, Yuan
署名单位:
AbbVie; Oregon State University
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS2001
发表日期:
2025
页码:
986-1005
关键词:
multinomial regression
variable selection
splines
optimization
coexistence
COMPETITION
MODEL
摘要:
To classify biological roles of different species in an ecological system, modern studies collect longitudinal and compositional counts of DNA sequences of taxonomically diagnostic genetic markers to measure the abundance of species over time. The major challenges of conducting this analysis are twofold: how to accommodate the complex dependence in this data type and how to model the longitudinal trajectories of the species' abundances. In this paper we propose a novel method named COMPARING to cluster longitudinal profiles for compositional count data to address these challenges. In COMPARING, generalized estimating equation is used to account for both the compositional and longitudinal dependence structures, nonparametric Bspline approximation is used to model the longitudinal curves, and apairwise-distance penalization is used to identify subgroups with similar longitudinal patterns. We establish the convergence rate of the estimated curves and conclude that the true subgroups can be correctly identified with a high probability. We also conduct simulation studies to show the advantage of COM-PARING over its competitors in clustering longitudinal trajectories from compositional count data. Finally, we apply COMPARING to study the coexistence of blood-borne parasites in African buffalo and demonstrate how the method successfully detects biologically meaningful subgroups of parasites for competition-colonization trade-off.
来源URL: