Dynamic Learning and Decision Making via Basis Weight Vectors

成果类型:
Article; Early Access
署名作者:
Zhang, Hao
署名单位:
University of British Columbia
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2021.2240
发表日期:
2022
关键词:
information relaxations Duality
摘要:
This paper presents a new methodology to solve a general model of dynamic decision making with a continuous unknown parameter or state. The methodology centers on the continuation-value functions (mappings from the parameter space to the continuation-value space), created by feasible continuation policies. When the model primitives can be described through a family of basis functions (e.g., polynomials), a continuation-value function retains that property and can be represented by a basis weight vector. The set of efficient basis weight vectors can be constructed through backward induction, which leads to a significant reduction of problem complexity and enables an exact solution for small-sized problems. A set of approximation methods based on the new methodology is developed to tackle larger problems. The methodology is also extended to the multidimensional (multiparameter) setting, which features the problem of contextual multiarmed bandits with linear expected rewards. The approximation algorithm developed in this paper outperforms three benchmark algorithms (epsilon-greedy, Thompson sampling, and LinUCB) in learning situations with many actions and short horizons.