On the surprising effectiveness of a simple matrix exponential derivative approximation, with application to global SARS-CoV-2

成果类型:
Article
署名作者:
Didier, Gustavo; Glatt-Holtz, Nathan E.; Holbrook, Andrew J.; Magee, Andrew F.; Suchard, Marc A.
署名单位:
Tulane University; University of California System; University of California Los Angeles; University of California System; University of California Los Angeles; University of California System; University of California Los Angeles
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-10738
DOI:
10.1073/pnas.2318989121
发表日期:
2024-01-16
关键词:
bayesian phylogenetic inference dna-sequences Markov compute trees
摘要:
The continuous-time Markov chain (CTMC) is the mathematical workhorse of evolutionary biology. Learning CTMC model parameters using modern, gradientbased methods requires the derivative of the matrix exponential evaluated at the CTMC's infinitesimal generator (rate) matrix. Motivated by the derivative's extreme computational complexity as a function of state space cardinality, recent work demonstrates the surprising effectiveness of a naive, first-order approximation for a host of problems in computational biology. In response to this empirical success, we obtain rigorous deterministic and probabilistic bounds for the error accrued by the naive approximation and establish a blessing of dimensionality result that is universal for a large class of rate matrices with random entries. Finally, we apply the first-order approximation within surrogate-trajectory HamiltonianMonte Carlo for the analysis of the early spread of Severe acute respiratory syndrome coronavirus 2 (SARSCoV-2) across 44 geographic regions that comprise a state space of unprecedented dimensionality for unstructured (flexible) CTMC models within evolutionary biology.