Preconditioning for feature selection and regression in high-dimensional problems'
成果类型:
Article
署名作者:
Paul, Debashis; Bair, Eric; Hastie, Trevor; Tibshirani, Robert
署名单位:
University of California System; University of California Davis; Stanford University; Stanford University; Stanford University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/009053607000000578
发表日期:
2008
页码:
1595-1618
关键词:
lasso
摘要:
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a preconditioned response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.