您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Journal of the Royal Statistical Society: Series B > 2017 > 5期

Convex clustering via l1 fusion penalization

成果类型：

Article

署名作者：

Radchenko, Peter; Mukherjee, Gourab

署名单位：

University of Southern California; University of Sydney

刊物名称：

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY

ISSN/ISSBN：

1369-7412

DOI：

10.1111/rssb.12223

发表日期：

2017

页码：

1527-1546

关键词：

cell mass cytometry REGRESSION SHRINKAGE feature-selection Data set number mixture Lasso

摘要：

We study the large sample behaviour of a convex clustering framework, which minimizes the sample within cluster sum of squares under an l(1) fusion constraint on the cluster centroids. This recently proposed approach has been gaining in popularity; however, its asymptotic properties have remained mostly unknown. Our analysis is based on a novel representation of the sample clustering procedure as a sequence of cluster splits determined by a sequence of maximization problems. We use this representation to provide a simple and intuitive formulation for the population clustering procedure. We then demonstrate that the sample procedure consistently estimates its population analogue and we derive the corresponding rates of convergence. The proof conducts a careful simultaneous analysis of a collection of M-estimation problems, whose cardinality grows together with the sample size. On the basis of the new perspectives gained from the asymptotic investigation, we propose a key post-processing modification of the original clustering framework. We show, both theoretically and empirically, that the resulting approach can be successfully used to estimate the number of clusters in the population. Using simulated data, we compare the proposed method with existing number-of-clusters and modality assessment approaches and obtain encouraging results. We also demonstrate the applicability of our clustering method to the detection of cellular subpopulations in a single-cell virology study.

来源URL：

访问原文