Probably certifiably correct k-means clustering

成果类型:
Article
署名作者:
Iguchi, Takayuki; Mixon, Dustin G.; Peterson, Jesse; Villar, Soledad
署名单位:
Air Force Institute of Technology (AFIT); University of Texas System; University of Texas Austin
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-016-1097-0
发表日期:
2017
页码:
605-642
关键词:
recovery
摘要:
Recently, Bandeira (C R Math, 2015) introduced a new type of algorithm (the so-called probably certifiably correct algorithm) that combines fast solvers with the optimality certificates provided by convex relaxations. In this paper, we devise such an algorithm for the problem of k-means clustering. First, we prove that Peng and Wei's semidefinite relaxation of k-means Peng and Wei (SIAM J Optim 18(1):186-205, 2007) is tight with high probability under a distribution of planted clusters called the stochastic ball model. Our proof follows from a new dual certificate for integral solutions of this semidefinite program. Next, we show how to test the optimality of a proposed k-means solution using this dual certificate in quasilinear time. Finally, we analyze a version of spectral clustering from Peng and Wei (SIAM J Optim 18(1):186-205, 2007) that is designed to solve k-means in the case of two clusters. In particular, we show that this quasilinear-time method typically recovers planted clusters under the stochastic ball model.