SCALABLE TEST OF STATISTICAL SIGNIFICANCE FOR PROTEIN-DNA BINDING CHANGES WITH INSERTION AND DELETION OF BASES IN THE GENOME
成果类型:
Article
署名作者:
Zhou, Qinyi; Zuo, Chandler; Zhang, Yuannyu; Chen, Min; Xu, Jian; Shin, Sunyoung
署名单位:
University of Texas System; University of Texas Dallas; St Jude Children's Research Hospital; Pohang University of Science & Technology (POSTECH)
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/24-AOAS1950
发表日期:
2024
页码:
3528-3548
关键词:
open-access database
variable selection
transcription
patterns
DISCOVERY
摘要:
Mutations in the noncoding DNA, which represents approximately 99% of the human genome, have been crucial to understanding disease mechanisms through dysregulation of disease-associated genes. One key element in gene regulation that noncoding mutations mediate is the binding of proteins to DNA sequences. Insertion and deletion of bases (InDels) are the second most common type of mutations, following single nucleotide polymorphisms, that may impact protein-DNA binding. However, no existing methods can estimate and test the effects of InDels on the process of protein-DNA binding. We develop a novel test of statistical significance, namely, the binding change test (BC test), using a Markov model to evaluate the impact and identify InDels altering protein-DNA binding. The test predicts binding changer InDels of regulatory significance with an efficient importance sampling algorithm generating background sequences in favor of large binding affinity changes. Simulation studies demonstrate its excellent performance. The application to human leukemia data uncovers, in critical cis-regulatory elements, candidate pathological InDels on modulating TF binding in leukemic patients. We develop an R package atIndel, which is available on GitHub.
来源URL: