Using Evidence of Mixed Populations to Select Variables for Clustering Very High-Dimensional Data

成果类型:
Article
署名作者:
Chan, Yao-ban; Hall, Peter
署名单位:
University of Melbourne
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/jasa.2010.tm09404
发表日期:
2010
页码:
798-809
关键词:
excess-mass density contour tests asymptotics set
摘要:
In this paper we develop a nonparametric approach to clustering very high-dimensional data, designed particularly for problems where the mixture nature of a population is expressed through multimodality of its density. Therefore, a technique based implicitly on mode testing can be particularly effective. In principle, several alternative approaches could be used to assess the extent of multimodality, but in the present problem the excess mass method has important advantages. We show that the resulting methodology for determining clusters is particularly effective in cases where the data are relatively heavy tailed or show a moderate to high degree of correlation, or when the number of important components is relatively small. Conversely, in the case of light-tailed, almost-independent components when there are many clusters, clustering in terms of modality can be less reliable than more conventional approaches. This article has supplementary material online.
来源URL: