A Statistical Framework for the Analysis of ChIP-Seq Data
成果类型:
Article
署名作者:
Kuan, Pei Fen; Chung, Dongjun; Pan, Guangjin; Thomson, James A.; Stewart, Ron; Keles, Suenduez
署名单位:
University of North Carolina; University of North Carolina Chapel Hill; University of Wisconsin System; University of Wisconsin Madison; University of Wisconsin System; University of Wisconsin Madison; Chinese Academy of Sciences; Guangzhou Institute of Biomedicine & Health, CAS; Chinese Academy of Sciences; University of Wisconsin System; University of Wisconsin Madison; University of Wisconsin System; University of Wisconsin Madison; The Morgridge Institute for Research, Inc.
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/jasa.2011.ap09706
发表日期:
2011
页码:
891-903
关键词:
gene-expression
high-resolution
binding
regions
genome
sequence
reveals
motif
dna
摘要:
Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard preprocessing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences noncross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies. This article has supplementary material online.