Robust Matrix Completion with Heavy-Tailed Noise

成果类型:
Article
署名作者:
Wang, Bingyan; Fan, Jianqing
署名单位:
Princeton University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2024.2375037
发表日期:
2025
页码:
922-934
关键词:
Convex relaxation Oracle Inequalities Optimal Rates Missing Data rank RECOVERY optimization minimization guarantees regression
摘要:
This article studies noisy low-rank matrix completion in the presence of heavy-tailed and possibly asymmetric noise, where we aim to estimate an underlying low-rank matrix given a set of highly incomplete noisy entries. Though the matrix completion problem has attracted much attention in the past decade, there is still lack of theoretical understanding when the observations are contaminated by heavy-tailed noises. Prior theory falls short of explaining the empirical results and is unable to capture the optimal dependence of the estimation error on the noise level. In this article, we adopt an adaptive Huber loss to accommodate heavy-tailed noise, which is robust against large and possibly asymmetric errors when the parameter in the Huber loss function is carefully designed to balance the Huberization biases and robustness to outliers. Then, we propose an efficient nonconvex algorithm via a balanced low-rank Burer-Monteiro matrix factorization and gradient descent with robust spectral initialization. We prove that under merely a bounded second-moment condition on the error distributions, rather than the sub-Gaussian assumption, the Euclidean errors of the iterates generated by the proposed algorithm decrease geometrically fast until achieving a minimax-optimal statistical estimation error, which has the same order as that in the sub-Gaussian case. The key technique behind this significant advancement is a powerful leave-one-out analysis framework. The theoretical results are corroborated by our numerical studies. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.