HETEROSKEDASTIC PCA: ALGORITHM, OPTIMALITY, AND APPLICATIONS

成果类型:
Article
署名作者:
Zhang, Anru R.; Cai, T. Tony; Wu, Yihong
署名单位:
University of Wisconsin System; University of Wisconsin Madison; Duke University; University of Pennsylvania; Yale University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/21-AOS2074
发表日期:
2022
页码:
53-80
关键词:
Covariance matrices LARGEST EIGENVALUE optimal shrinkage eigenstructure approximation asymptotics completion
摘要:
A general framework for principal component analysis (PCA) in the presence of heteroskedastic noise is introduced. We propose an algorithm called HeteroPCA, which involves iteratively imputing the diagonal entries of the sample covariance matrix to remove estimation bias due to heteroskedasticity. This procedure is computationally efficient and provably optimal under the generalized spiked covariance model. A key technical step is a deterministic robust perturbation analysis on singular subspaces, which can be of independent interest. The effectiveness of the proposed algorithm is demonstrated in a suite of problems in high-dimensional statistics, including singular value decomposition (SVD) under heteroskedastic noise, Poisson PCA, and SVD for heteroskedastic and incomplete data.