Multiple hypothesis testing by clustering treatment effects

成果类型:
Article
署名作者:
Dahl, David B.; Newton, Michael A.
署名单位:
Texas A&M University System; Texas A&M University College Station; University of Wisconsin System; University of Wisconsin Madison; University of Wisconsin System; University of Wisconsin Madison
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/016214507000000211
发表日期:
2007
页码:
517-526
关键词:
false discovery rate chain monte-carlo BAYES software
摘要:
Multiple hypothesis testing and clustering have been the subject of extensive research in high-dimensional inference, yet these problems usually have been treated separately. By defining true clusters in terms of shared parameter values, we could improve the sensitivity of individual tests, because more data bearing on the same parameter values are available. We develop and evaluate a hybrid methodology that uses clustering information to increase testing sensitivity and accommodates uncertainty in the true clustering. To investigate the potential efficacy of the hybrid approach, we first study a stylized example in which each object is evaluated with a standard z score but different objects are connected by shared parameter values. We show that there is increased testing power when the clustering is estimated sufficiently well. We next develop a model-based analysis using a conjugate Dirichlet process mixture model. The method is' general, but for specificity we focus attention on microarray gene expression data, to which both clustering and multiple testing methods are actively applied. Clusters provide the means for sharing information among genes, and the hybrid methodology averages over uncertainty in these clusters through Markov chain sampling. Simulations show that the hybrid method performs substantially better than other methods when clustering is heavy or moderate and performs well even under weak clustering. The proposed method is illustrated on microarray data from a study of the effects of aging on gene expression in heart tissue.