GROUPED VARIABLE SELECTION WITH DISCRETE OPTIMIZATION: COMPUTATIONAL AND STATISTICAL PERSPECTIVES
成果类型:
Article
署名作者:
Hazimeh, Hussein; Mazumder, Rahul; Radchenko, Peter
署名单位:
Alphabet Inc.; Google Incorporated; Massachusetts Institute of Technology (MIT); University of Sydney
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/21-AOS2155
发表日期:
2023
页码:
1-32
关键词:
minimax-optimal rates
sparse linear-models
scalable algorithms
group lasso
regression
reconstruction
RECOVERY
摘要:
We present a new algorithmic framework for grouped variable selection that is based on discrete mathematical optimization. While there exist several appealing approaches based on convex relaxations and nonconvex heuristics, we focus on optimal solutions for the l(0)-regularized formulation, a problem that is relatively unexplored due to computational challenges. Our methodol-ogy covers both high-dimensional linear regression and nonparametric sparse additive modeling with smooth components. Our algorithmic framework con-sists of approximate and exact algorithms. The approximate algorithms are based on coordinate descent and local search, with runtimes comparable to popular sparse learning algorithms. Our exact algorithm is based on a stan-dalone branch-and-bound (BnB) framework, which can solve the associated mixed integer programming (MIP) problem to certified optimality. By ex-ploiting the problem structure, our custom BnB algorithm can solve to opti-mality problem instances with 5 x 10(6) features and 103 observations in min-utes to hours-over 1000 times larger than what is currently possible using state-of-the-art commercial MIP solvers. We also explore statistical proper-ties of the l(0)-based estimators. We demonstrate, theoretically and empiri-cally, that our proposed estimators have an edge over popular group-sparse estimators in terms of statistical performance in various regimes. We provide an open source implementation of our proposed framework.