ScreeNOT: EXACT MSE-OPTIMAL SINGULAR VALUE THRESHOLDING IN CORRELATED NOISE

成果类型:
Article
署名作者:
Donoho, David; Gavish, Matan; Romanov, Elad
署名单位:
Stanford University; Hebrew University of Jerusalem
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/22-AOS2232
发表日期:
2023
页码:
122-148
关键词:
limiting spectral distribution principal-components-analysis LARGEST EIGENVALUE parallel analysis matrix number PCA
摘要:
We derive a formula for optimal hard thresholding of the singular value decomposition in the presence of correlated additive noise; although it nomi-nally involves unobservables, we show how to apply it even where the noise covariance structure is not a priori known or is not independently estimable. The proposed method, which we call ScreeNOT, is a mathematically solid alternative to Cattell's ever-popular but vague scree plot heuristic from 1966. ScreeNOT has a surprising oracle property: it typically achieves exactly, in large finite samples, the lowest possible MSE for matrix recovery, on each given problem instance, that is, the specific threshold it selects gives exactly the smallest achievable MSE loss among all possible threshold choices for that noisy data set and that unknown underlying true low rank model. The method is computationally efficient and robust against perturbations of the underlying covariance structure. Our results depend on the assumption that the singular values of the noise have a limiting empirical distribution of com-pact support; this property, which is standard in random matrix theory, is satisfied by many models exhibiting either cross-row correlation structure or cross-column correlation structure, and also by many situations with more general, interelement correlation structure. Simulations demonstrate the ef-fectiveness of the method even at moderate matrix sizes. The paper is supple-mented by ready-to-use software packages implementing the proposed algo-rithm: package ScreeNOT in Python (via PyPI) and R (via GRAN).