Optimal clustering by Lloyd's algorithm for low-rank mixture model

成果类型:
Article; Early Access
署名作者:
Lyu, Zhongyuan; Xia, Dong
署名单位:
Hong Kong University of Science & Technology
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1093/jrsssb/qkaf041
发表日期:
2025
关键词:
Community Detection feature-selection matrix networks
摘要:
This article investigates the computational and statistical limits in clustering matrix-valued observations. We propose a low-rank mixture model (LrMM), adapted from the classical Gaussian mixture model (GMM), to handle matrix-valued observations, assuming low-rankness for population centre matrices. A computationally efficient clustering method is designed by integrating Lloyd's algorithm and low-rank approximation. Once well-initialized, the algorithm converges fast and achieves an exponential clustering error rate that is minimax optimal. Meanwhile, we show that a tensor-based spectral method delivers a good initial clustering. Similar to GMM, the minimax optimal clustering error rate is determined by the separation strength, i.e. the minimal distance between population centre matrices. Unlike GMM, however, the computational difficulty of LrMM is characterized by the signal strength, i.e. the smallest non-zero singular values of population centre matrices. Evidence is provided showing that no polynomial-time algorithm is consistent if the signal strength is not strong enough, even if the separation strength is strong. Intriguing differences between estimation and clustering under LrMM are discussed. The merits of low-rank Lloyd's algorithm are confirmed by comprehensive simulation experiments.