A Framework for Feature Selection in Clustering
成果类型:
Article
署名作者:
Witten, Daniela M.; Tibshirani, Robert
署名单位:
Stanford University; Stanford University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/jasa.2010.tm09415
发表日期:
2010
页码:
713-726
关键词:
VARIABLE SELECTION
principal-components
objects
number
摘要:
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observations using an adaptively chosen subset of the features. The method uses a lasso-type penalty to select the features. We use this framework to develop simple methods for sparse K-means and sparse hierarchical clustering. A single criterion governs both the selection of the features and the resulting clusters. These approaches are demonstrated on simulated and genomic data.