A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression

成果类型:
Article
署名作者:
Song, Qifan; Liang, Faming
署名单位:
Purdue University System; Purdue University; State University System of Florida; University of Florida
刊物名称:
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY
ISSN/ISSBN:
1369-7412
DOI:
10.1111/rssb.12095
发表日期:
2015
页码:
947-972
关键词:
GENERALIZED LINEAR-MODELS stochastic-approximation monte-carlo misspecified models oracle properties Lasso regularization computation shrinkage algorithm
摘要:
We propose a Bayesian variable selection approach for ultrahigh dimensional linear regression based on the strategy of split and merge. The approach proposed consists of two stages: split the ultrahigh dimensional data set into a number of lower dimensional subsets and select relevant variables from each of the subsets, and aggregate the variables selected from each subset and then select relevant variables from the aggregated data set. Since the approach proposed has an embarrassingly parallel structure, it can be easily implemented in a parallel architecture and applied to big data problems with millions or more of explanatory variables. Under mild conditions, we show that the approach proposed is consistent, i.e. the true explanatory variables can be correctly identified by the approach as the sample size becomes large. Extensive comparisons of the approach proposed have been made with penalized likelihood approaches, such as the lasso, elastic net, sure independence screening and iterative sure independence screening. The numerical results show that the approach proposed generally outperforms penalized likelihood approaches: the models selected by the approach tend to be more sparse and closer to the true model.