Bayesian models for multiple local sequence alignment and Gibbs sampling strategies

成果类型:
Article
署名作者:
Liu, JS; Neuwald, AF; Lawrence, CE
署名单位:
National Institutes of Health (NIH) - USA; NIH National Library of Medicine (NLM); State University of New York (SUNY) System; Wadsworth Center; Harvard University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.2307/2291508
发表日期:
1995
页码:
1156-1170
关键词:
maximum-likelihood alignment dna-sequences algorithm proteins sites
摘要:
A wealth of data concerning life's basic molecules, proteins and nucleic acids, has emerged from the biotechnology revolution. The human genome project has accelerated the growth of these data. Multiple observations of homologous protein or nucleic acid sequences from different organisms are often available. But because mutations and sequence errors misalign these data, multiple sequence alignment has become an essential and valuable tool for understanding structures and functions of these molecules. A recently developed Gibbs sampling algorithm has been applied with substantial advantage in this setting. In this article we develop a full Bayesian foundation for this algorithm and present extensions that permit relaxation of two important restrictions. We also present a rank test for the assessment of the significance of multiple sequence alignment. As an example, we study the set of dinucleotide binding proteins and predict binding segments for dozens of its members.