PAIRWISE SEQUENCE ALIGNMENT AT ARBITRARILY LARGE EVOLUTIONARY DISTANCE

成果类型:
Article
署名作者:
Legried, Brandon; Roch, Sebastien
署名单位:
University System of Georgia; Georgia Institute of Technology; University of Wisconsin System; University of Wisconsin Madison
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
DOI:
10.1214/23-AAP2009
发表日期:
2024
关键词:
sample complexity twilight zone reconstruction trees MODEL state
摘要:
Ancestral sequence reconstruction is a key task in computational biology. It consists in inferring a molecular sequence at an ancestral species of a known phylogeny, given descendant sequences at the tip of the tree. In addition to its many biological applications, it has played a key role in elucidating the statistical performance of phylogeny estimation methods. Here we establish a formal connection to another important bioinformatics problem, multiple sequence alignment, where one attempts to best align a collection of molecular sequences under some mismatch penalty score by inserting gaps. Our result is counter-intuitive: we show that perfect pairwise sequence alignment with high probability is possible in principle at arbitrary large evolutionary distances-provided the phylogeny is known and dense enough. We use techniques from ancestral sequence reconstruction in the taxon-rich setting together with the probabilistic analysis of sequence evolution models involving insertions and deletions.