Approximate word matches between two random sequences

成果类型:
Article
署名作者:
Burden, Conrad J.; Kantorovitz, Miriam R.; Wilson, Susan R.
署名单位:
Australian National University; John Curtin School of Medical Research; Australian National University; University of Illinois System; University of Illinois Urbana-Champaign
刊物名称:
ANNALS OF APPLIED PROBABILITY
ISSN/ISSBN:
1050-5164
DOI:
10.1214/07-AAP452
发表日期:
2008
页码:
1-21
关键词:
poisson approximation tag alignment d2-cluster
摘要:
Given two sequences over a finite alphabet L, the D-2 statistic is the number of m-letter word matches between the two sequences. This statistic is used in bioinformatics for expressed sequence tag database searches. Here we study a generalization of the D2 statistic in the context of DNA sequences, under the assumption of strand symmetric Bernoulli text. For k < m, we look at the count of m-letter word matches with up to k mismatches. For this statistic, we compute the expectation, give upper and lower bounds for the variance and prove its distribution is asymptotically normal.
来源URL: