On Consistency and Sparsity for Principal Components Analysis in High Dimensions

成果类型:
Article
署名作者:
Johnstone, Iain M.; Lu, Arthur Yu
署名单位:
Stanford University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1198/jasa.2009.0121
发表日期:
2009
页码:
682-693
关键词:
feature-selection PCA
摘要:
Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading, principal component vector via standard PCA is consistent if and only if p(n)/n -> 0. We provide a simple algorithm for selecting it subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) >> n.