您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Statistics > 2015 > 3期

DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT?

成果类型：

Article

署名作者：

Krauthgamer, Robert; Nadler, Boaz; Vilenchik, Dan

署名单位：

Weizmann Institute of Science; Ben-Gurion University of the Negev

刊物名称：

ANNALS OF STATISTICS

ISSN/ISSBN：

0090-5364

DOI：

10.1214/15-AOS1310

发表日期：

2015

页码：

1300-1322

关键词：

Principal component analysis large hidden clique matrix

摘要：

Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an l(0)-sparse eigenvector, in the asymptotic regime as dimension p and sample size n both tend to infinity. Amini and Wainwright [Ann. Statist. 37 (2009) 2877-2921] proved that for sparsity levels k >= Omega (n/ log p), no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for k <= O (root n/log p), diagonal thresholding is consistent. It was further conjectured that an SDP approach may close this gap between computational and information limits. We prove that when k >= Omega(root n), the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of l(0)-sparsity k >= Omega (root n). Finally, we present empirical results suggesting that up to sparsity levels k = O (root n), recovery is possible by a simple covariance thresholding algorithm.