Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data
成果类型:
Article
署名作者:
Wang, Xue; Wei, Mike Mingcheng; Yao, Tao
署名单位:
Alibaba Group; State University of New York (SUNY) System; University at Buffalo, SUNY; Shanghai Jiao Tong University
刊物名称:
MANAGEMENT SCIENCE
ISSN/ISSBN:
0025-1909
DOI:
10.1287/mnsc.2022.01557
发表日期:
2025
关键词:
multiarmed bandits
minimax concave penalty
High-dimensional Data
online learning and decision making
Generalized Linear Model
摘要:
We propose a minimax concave penalized multiarmed bandit algorithm under the generalized linear model (G-MCP-Bandit) for decision-makers facing high-dimensional data in an online learning and decision-making environment. We demonstrate that in the data-rich regime, the G-MCP-Bandit algorithm attains the optimal cumulative regret in the sample size dimension and a tight bound in the covariate dimension and the significant covariate dimension. In the data-poor regime, the G-MCP-Bandit algorithm maintains a tight regret upper bound. In addition, we develop a local linear approximation method, the two-step weighted Lasso procedure, to identify the minimax concave penalty (MCP) estimator for the G-MCP-Bandit algorithm when samples are not independent and identi-cally distributed. Under this procedure, the MCP estimator can match the oracle estimator with high probability and converge to the true parameters at the optimal convergence rate. Finally, through experiments based on both synthetic and real data sets, we show that the G-MCP-Bandit algorithm outperforms other benchmarking algorithms in terms of cumula-tive regret and that the benefits of the G-MCP-Bandit algorithm increase in the data's spar-sity level and the size of the decision set.