MWPCR: Multiscale Weighted Principal Component Regression for High-Dimensional Prediction
成果类型:
Article
署名作者:
Zhu, Hongtu; Shen, Dan; Peng, Xuewei; Liu, Leo Yufeng
署名单位:
University of Texas System; UTMD Anderson Cancer Center; University of North Carolina; University of North Carolina Chapel Hill; State University System of Florida; University of South Florida; State University System of Florida; University of South Florida; Texas A&M University System; Texas A&M University College Station; University of North Carolina; University of North Carolina Chapel Hill
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2016.1261710
发表日期:
2017
页码:
1009-1021
关键词:
models
CLASSIFICATION
variables
TUTORIAL
摘要:
We propose a multiscale weighted principal component regression (MWPCR) framework for the use of high-dimensional features with strong spatial features (e.g., smoothness and correlation) to predict an outcome variable, such as disease status. This development is motivated by identifying imaging biomarkers that could potentially aid detection, diagnosis, assessment of prognosis, prediction of response to treatment, and monitoring of disease status, among many others. The MWPCR can be regarded as a novel integration of principal components analysis (PCA), kernel methods, and regression models. In MWPCR, we introduce various weight matrices to prewhitten high-dimensional feature vectors, perform matrix decomposition for both dimension reduction and feature extraction, and build a prediction model by using the extracted features. Examples of such weight matrices include an importance score weight matrix for the selection of individual features at each location and a spatial weight matrix for the incorporation of the spatial pattern of feature vectors. We integrate the importance of score weights with the spatial weights to recover the low-dimensional structure of high-dimensional features. We demonstrate the utility of our methods through extensive simulations and real data analyses of the Alzheimer's disease neuroimaging initiative (ADNI) dataset. Supplementary materials for this article are available online.