您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Statistics > 2009 > 5B期

HIGH-DIMENSIONAL ANALYSIS OF SEMIDEFINITE RELAXATIONS FOR SPARSE PRINCIPAL COMPONENTS

成果类型：

Article

署名作者：

Amini, Arash A.; Wainwright, Martin J.

署名单位：

University of California System; University of California Berkeley; University of California System; University of California Berkeley

刊物名称：

ANNALS OF STATISTICS

ISSN/ISSBN：

0090-5364

DOI：

10.1214/08-AOS664

发表日期：

2009

页码：

2877-2921

关键词：

covariance selection NORM

摘要：

Principal component analysis (PCA) is a classical method for dimensionality reduction based oil extracting the dominant eigenvectors of the Sample covariance matrix. However, PCA is well known to behave poorly inw the large p, small n setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PICA ill this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the Support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method. which transitions from success to failure as a function of the resealed sample size theta(dia)(n, p, k) = n/[k(2) log(p - k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the resealed sample Size theta(sdp)(n, p, k) = n/[k log(p - k)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can Succeed in recovering the Support if the order parameter theta(sdp)(n, p, k) is below a threshold. Our results thus highlight,in interesting trade-oft between computational and statistical efficiency in high-dimensional inference.