STATISTICAL AND COMPUTATIONAL TRADE-OFFS IN ESTIMATION OF SPARSE PRINCIPAL COMPONENTS

成果类型:
Article
署名作者:
Wang, Tengyao; Berthet, Quentin; Samworth, Richard J.
署名单位:
University of Cambridge; California Institute of Technology
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/15-AOS1369
发表日期:
2016
页码:
1896-1930
关键词:
power method PCA EIGENVALUE minimization relaxations Consistency cliques
摘要:
extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the mini-max optimal rate over certain Gaussian or sub-Gaussian classes. In this paper, we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a restricted covariance concentration condition, we show that there is an effective sample size regime in which no randomised polynomial time algorithm can achieve the minimax optimal rate. We also study the theoretical performance of a (polynomial time) variant of the well-known semidefinite relaxation estimator, revealing a subtle interplay between statistical and computational efficiency.