A Bayesian insertion/deletion algorithm for distant protein motif searching via entropy filtering

成果类型:
Article
署名作者:
Xie, J; Li, KC; Bina, M
署名单位:
Purdue University System; Purdue University; University of California System; University of California Los Angeles; Purdue University System; Purdue University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/016214504000000377
发表日期:
2004
页码:
409-420
关键词:
dna-binding proteins loop-helix protein transcription factors sequence alignment models myod complexes database FAMILY
摘要:
Bayesian models have been developed that find ungapped motifs in multiple protein sequences. In (his article. we extend the model to allow for deletions and insertions in motifs. Direct generalization of the ungapped algorithm, based on Gibbs sampling, proved unsuccessful because the configuration space became much larger. To alleviate the convergence difficulty, a two-stage procedure is introduced. At the first stage. we develop a method called entropy filtering, which quick]), searchs good starting points for the alignment approach without the concern of deletion/insertion patterns. At the second stage, we switch to an algorithm that generates both a random vector that represents insertion/deletion patterns and a random variable of motif locations. After the two steps, gapped-motif alignments are obtained for multiple sequences. When applied to datasets that consist of helix-loop-helix proteins and high mobility group proteins, respectively. our methods show great improvements over those that produce ungapped alignments.