AN IMPOSSIBILITY RESULT FOR PHYLOGENY RECONSTRUCTION FROM k-MER COUNTS
成果类型:
Article
署名作者:
Fan, Wai-Tong Louis; Legried, Brandon; ROCH, Sesbastien
署名单位:
Indiana University System; Indiana University Bloomington; University of Michigan System; University of Michigan; University of Wisconsin System; University of Wisconsin Madison
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
DOI:
10.1214/22-AAP1805
发表日期:
2022
页码:
4893-4913
关键词:
sample complexity
PHASE-TRANSITION
alignment
trees
MODEL
摘要:
We consider phylogeny estimation under a two-state model of sequence evolution by site substitution on a tree. In the asymptotic regime where the sequence lengths tend to infinity, we show that for any fixed k no statistically consistent phylogeny estimation is possible from k-mer counts over the full leaf sequences alone. Formally, we establish that the joint distribution of k-mer counts over the entire leaf sequences on two distinct trees have total variation distance bounded away from 1 as the sequence length tends to infinity. Our impossibility result implies that statistical consistency requires more sophisticated use of k-mer count information, such as block techniques developed in previous theoretical work.