您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Journal of the American Statistical Association > 1999 > 445期

Markovian structures in biological sequence alignments

成果类型：

Article

署名作者：

Liu, JS; Neuwald, AF; Lawrence, CE

署名单位：

Stanford University; Cold Spring Harbor Laboratory; State University of New York (SUNY) System; Wadsworth Center

刊物名称：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION

ISSN/ISSBN：

0162-1459

DOI：

10.2307/2669673

发表日期：

1999

页码：

1-15

关键词：

maximum-likelihood alignment dna-sequences protein models database identification search

摘要：

The alignment of multiple homologous biopolymer sequences is crucial in research on protein modeling and engineering, molecular evolution, and prediction in terms of both gene function and gene product structure. In this article we provide a coherent view of the two recent models used for multiple sequence alignment-the hidden Markov model (HMM) and the block-based motif model-to develop a set of new algorithms that have both the sensitivity of the block-based model and the flexibility of the HMM. In particular, we decompose the standard HMM into two components: the insertion component, which is captured by the so-called propagation model, and the deletion component, which is described by a deletion vector. Such a decomposition serves as a basis for rational compromise between biological specificity and model flexibility. Furthermore, we introduce a Bayesian model selection criterion that-in combination with the propagation model, genetic algorithm, and other computational aspects-forms the core of PROBE, a multiple alignment and database search methodology. The application of our method to a GTPase family of protein sequences yields an alignment that is confirmed by comparison with known tertiary structures.