Variable selection in clustering via Dirichlet process mixture models

成果类型:
Article
署名作者:
Kim, Sinae; Tadesse, Mahlet G.; Vannucci, Marina
署名单位:
Texas A&M University System; Texas A&M University College Station; University of Pennsylvania; Texas A&M University System; Texas A&M University College Station
刊物名称:
BIOMETRIKA
ISSN/ISSBN:
0006-3444
DOI:
10.1093/biomet/93.4.877
发表日期:
2006
页码:
877893
关键词:
bayesian-analysis sampling methods unknown number CLASSIFICATION distributions inference
摘要:
The increased collection of high-dimensional data in various fields has raised a strong interest in clustering algorithms and variable selection procedures. In this paper, we propose a model-based method that addresses the two problems simultaneously. We introduce a latent binary vector to identify discriminating variables and use Dirichlet process mixture models to define the cluster structure. We update the variable selection index using a Metropolis algorithm and obtain inference on the cluster structure via a split-merge Markov chain Monte Carlo technique. We explore the performance of the methodology on simulated data and illustrate an application with a DNA microarray study.