A POISSON APPROXIMATION FOR SEQUENCE COMPARISONS WITH INSERTIONS AND DELETIONS

成果类型:
Article
署名作者:
NEUHAUSER, C
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/aos/1176325645
发表日期:
1994
页码:
1603-1629
关键词:
LAW
摘要:
We construct a statistical test for a sequence alignment problem which enables us to decide whether two given sequences are related. Such a test can be used in DNA and protein sequence comparisons. It is based on a comparison of two long sequences of i.i.d. letters taken from a finite alphabet. The test statistic typically employed is the length of the longest matching region between the two sequences in which a certain number of insertions and deletions but no mismatches are allowed. We give a distributional result which enables one to compute P-values, and hence to decide whether or not the two sequences are related. Its proof utilizes the Chen-Stein method for Poisson approximation. The test is based on a greedy algorithm that searches for the longest matching region. We show that this algorithm finds the longest matching region with probability approaching 1 as the lengths of the two sequences go to infinity.