您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2022

Dynamic Learning and Decision Making via Basis Weight Vectors

成果类型：

Article; Early Access

署名作者：

Zhang, Hao

署名单位：

University of British Columbia

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2021.2240

发表日期：

2022

关键词：

information relaxations Duality

摘要：

This paper presents a new methodology to solve a general model of dynamic decision making with a continuous unknown parameter or state. The methodology centers on the continuation-value functions (mappings from the parameter space to the continuation-value space), created by feasible continuation policies. When the model primitives can be described through a family of basis functions (e.g., polynomials), a continuation-value function retains that property and can be represented by a basis weight vector. The set of efficient basis weight vectors can be constructed through backward induction, which leads to a significant reduction of problem complexity and enables an exact solution for small-sized problems. A set of approximation methods based on the new methodology is developed to tackle larger problems. The methodology is also extended to the multidimensional (multiparameter) setting, which features the problem of contextual multiarmed bandits with linear expected rewards. The approximation algorithm developed in this paper outperforms three benchmark algorithms (epsilon-greedy, Thompson sampling, and LinUCB) in learning situations with many actions and short horizons.