Group SLOPE - Adaptive Selection of Groups of Predictors
成果类型:
Article
署名作者:
Brzyski, Damian; Gossmann, Alexej; Su, Weijie; Bogdan, Malgorzata
署名单位:
Indiana University System; Indiana University Bloomington; Jagiellonian University; Tulane University; University of Pennsylvania; University of Wroclaw
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2017.1411269
发表日期:
2019
页码:
419-433
关键词:
false discovery rate
association
MODEL
optimality
sparsity
摘要:
Sorted L-One Penalized Estimation (SLOPE; Bogdan etal. 2013, 2015) is a relatively new convex optimization procedure, which allows for adaptive selection of regressors under sparse high-dimensional designs. Here, we extend the idea of SLOPE to deal with the situation when one aims at selecting whole groups of explanatory variables instead of single regressors. Such groups can be formed by clustering strongly correlated predictors or groups of dummy variables corresponding to different levels of the same qualitative predictor. We formulate the respective convex optimization problem, group SLOPE (gSLOPE), and propose an efficient algorithm for its solution. We also define a notion of the group false discovery rate (gFDR) and provide a choice of the sequence of tuning parameters for gSLOPE so that gFDR is provably controlled at a prespecified level if the groups of variables are orthogonal to each other. Moreover, we prove that the resulting procedure adapts to unknown sparsity and is asymptotically minimax with respect to the estimation of the proportions of variance of the response variable explained by regressors from different groups. We also provide a method for the choice of the regularizing sequence when variables in different groups are not orthogonal but statistically independent and illustrate its good properties with computer simulations. Finally, we illustrate the advantages of gSLOPE in the context of Genome Wide Association Studies. R package grpSLOPE with an implementation of our method is available on The Comprehensive R Archive Network.