GIBBS POSTERIOR FOR VARIABLE SELECTION IN HIGH-DIMENSIONAL CLASSIFICATION AND DATA MINING

成果类型:
Article
署名作者:
Jiang, Wenxin; Tanner, Martin A.
署名单位:
Northwestern University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/07-AOS547
发表日期:
2008
页码:
2207-2231
关键词:
logistic-regression distributions models CHOICE
摘要:
In the popular approach of Bayesian variable selection (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with I Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (Such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the Usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables K can be much larger than the sample size n. In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.