Bayesian Modeling of MPSS Data: Gene Expression Analysis of Bovine Salmonella Infection

成果类型:
Article
署名作者:
Dhavala, Soma S.; Datta, Sujay; Mallick, Bani K.; Carroll, Raymond J.; Khare, Sangeeta; Lawhon, Sara D.; Adams, L. Garry
署名单位:
Texas A&M University System; Texas A&M University College Station; Fred Hutchinson Cancer Center; Texas A&M University System; Texas A&M University College Station
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/jasa.2010.ap08327
发表日期:
2010
页码:
956-967
关键词:
statistical-analysis actin cytoskeleton serial analysis mixture inference
摘要:
Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflated Poisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models one parametric and the other semiparametric with a Dirichlet process prior that has the ability to borrow strength across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online.