COMPUTATIONAL AND STATISTICAL BOUNDARIES FOR SUBMATRIX LOCALIZATION IN A LARGE NOISY MATRIX

成果类型:
Article
署名作者:
Cai, T. Tony; Liang, Tengyuan; Rakhlin, Alexander
署名单位:
University of Pennsylvania
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/16-AOS1488
发表日期:
2017
页码:
1403-1430
关键词:
low-rank approximation probability-inequalities Optimal Rates sparse pca algorithms
摘要:
We study in this paper computational and statistical boundaries for submatrix localization. Given one observation of (one or multiple nonoverlapping) signal submatrix (of magnitude. and size k(m) x k(n)) embedded in a large noise matrix (of size mxn), the goal is to optimal identify the support of the signal submatrix computationally and statistically. Two transition thresholds for the signal-to-noise ratio lambda/ sigma are established in terms of m, n, k(m) and k(n). The first threshold, SNRc, corresponds to the computational boundary. We introduce a new linear time spectral algorithm that identifies the submatrix with high probability when the signal strength is above the threshold SNRc. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the hidden clique hypothesis. The second threshold, SNRs, captures the statistical boundary, below which no method can succeed in localization with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. In marked contrast to submatrix detection and sparse PCA, the results show an interesting phenomenon that SNRc is always significantly larger than SNRs under the sub-Gaussian error model, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.
来源URL: