Bayesian Approximate Kernel Regression With Variable Selection
成果类型:
Article
署名作者:
Crawford, Lorin; Wood, Kris C.; Zhou, Xiang; Mukherjee, Sayan
署名单位:
Brown University; Brown University; Brown University; Duke University; University of Michigan System; University of Michigan; University of Michigan System; University of Michigan; Duke University; Duke University; Duke University; Duke University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2017.1361830
发表日期:
2018
页码:
1710-1721
关键词:
prediction
CLASSIFICATION
association
MODEL
摘要:
Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this article, we propose a novel framework that provides an effect size analog for each explanatory variable in Bayesian kernel regression models when the kernel is shift-invariantfor example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. This projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion, we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e.,phenotypic prediction) and association mapping (i.e.,inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings. Supplementary materials for this article are available online.