HIGH-DIMENSIONAL ANALYSIS OF SEMIDEFINITE RELAXATIONS FOR SPARSE PRINCIPAL COMPONENTS

成果类型:
Article
署名作者:
Amini, Arash A.; Wainwright, Martin J.
署名单位:
University of California System; University of California Berkeley; University of California System; University of California Berkeley
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/08-AOS664
发表日期:
2009
页码:
2877-2921
关键词:
covariance selection NORM
摘要:
Principal component analysis (PCA) is a classical method for dimensionality reduction based oil extracting the dominant eigenvectors of the Sample covariance matrix. However, PCA is well known to behave poorly inw the large p, small n setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PICA ill this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the Support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method. which transitions from success to failure as a function of the resealed sample size theta(dia)(n, p, k) = n/[k(2) log(p - k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the resealed sample Size theta(sdp)(n, p, k) = n/[k log(p - k)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can Succeed in recovering the Support if the order parameter theta(sdp)(n, p, k) is below a threshold. Our results thus highlight,in interesting trade-oft between computational and statistical efficiency in high-dimensional inference.