Online Decision Making with High-Dimensional Covariates
成果类型:
Article
署名作者:
Bastani, Hamsa; Bayati, Mohsen
署名单位:
University of Pennsylvania; Stanford University
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2019.1902
发表日期:
2020
页码:
276-294
关键词:
contextual bandits
adaptive treatment allocation
online learning
High-dimensional Statistics
Lasso
personalized decision making
摘要:
Big data have enabled decision makers to tailor decisions at the individual level in a variety of domains, such as personalized medicine and online advertising. Doing so involves learning a model of decision rewards conditional on individual-specific covariates. In many practical settings, these covariates are high dimensional; however, typically only a small subset of the observed features are predictive of a decision's success. We formulate this problem as a K-armed contextual bandit with high-dimensional covariates and present a new efficient bandit algorithm based on the LASSO estimator. We prove that our algorithm's cumulative expected regret scales at most polylogarithmically in the covariate dimension d; to the best of our knowledge, this is the first such bound for a contextual bandit. The key step in our analysis is proving a new tail inequality that guarantees the convergence of the LASSO estimator despite the non-i.i.d. data induced by the bandit policy. Furthermore, we illustrate the practical relevance of our algorithm by evaluating it on a simplified version of a medication dosing problem. A patient's optimal medication dosage depends on the patient's genetic profile and medical records; incorrect initial dosage may result in adverse consequences, such as stroke or bleeding. We show that our algorithm outperforms existing bandit methods and physicians in correctly dosing a majority of patients.
来源URL: