您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Management Science > 2025 > 8期

Online Learning and Decision Making Under Generalized Linear Model with High-Dimensional Data

成果类型：

Article

署名作者：

Wang, Xue; Wei, Mike Mingcheng; Yao, Tao

署名单位：

Alibaba Group; State University of New York (SUNY) System; University at Buffalo, SUNY; Shanghai Jiao Tong University

刊物名称：

MANAGEMENT SCIENCE

ISSN/ISSBN：

0025-1909

DOI：

10.1287/mnsc.2022.01557

发表日期：

2025

关键词：

multiarmed bandits minimax concave penalty High-dimensional Data online learning and decision making Generalized Linear Model

摘要：

We propose a minimax concave penalized multiarmed bandit algorithm under the generalized linear model (G-MCP-Bandit) for decision-makers facing high-dimensional data in an online learning and decision-making environment. We demonstrate that in the data-rich regime, the G-MCP-Bandit algorithm attains the optimal cumulative regret in the sample size dimension and a tight bound in the covariate dimension and the significant covariate dimension. In the data-poor regime, the G-MCP-Bandit algorithm maintains a tight regret upper bound. In addition, we develop a local linear approximation method, the two-step weighted Lasso procedure, to identify the minimax concave penalty (MCP) estimator for the G-MCP-Bandit algorithm when samples are not independent and identi-cally distributed. Under this procedure, the MCP estimator can match the oracle estimator with high probability and converge to the true parameters at the optimal convergence rate. Finally, through experiments based on both synthetic and real data sets, we show that the G-MCP-Bandit algorithm outperforms other benchmarking algorithms in terms of cumula-tive regret and that the benefits of the G-MCP-Bandit algorithm increase in the data's spar-sity level and the size of the decision set.