您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Management Science > 2022 > 8期

Analytical Solution to a Discrete-Time Model for Dynamic Learning and Decision Making

成果类型：

Article

署名作者：

Zhang, Hao

署名单位：

University of British Columbia

刊物名称：

MANAGEMENT SCIENCE

ISSN/ISSBN：

0025-1909

DOI：

10.1287/mnsc.2021.4194

发表日期：

2022

页码：

5924-5957

关键词：

learning and doing sequential hypothesis testing dynamic pricing with demand learning multiarmed bandits partially observable Markov decision processes

摘要：

Problems concerning dynamic learning and decision making are difficult to solve analytically. We study an infinite-horizon discrete-time model with a constant unknown state that may take two possible values. As a special partially observable Markov decision process (POMDP), this model unifies several types of learning-and-doing problems such as sequential hypothesis testing, dynamic pricing with demand learning, and multiarmed bandits. We adopt a relatively new solution framework fromthe POMDP literature based on the backward construction of the efficient frontier(s) of continuation-value vectors. This framework accommodates different optimality criteria simultaneously. In the infinite-horizon setting, with the aid of a set of signal quality indices, the extreme points on the efficient frontier can be linked through a set of difference equations and solved analytically. The solution carries structural properties analogous to those obtained under continuous-time models, and it provides a useful tool for making new discoveries through discrete-time models.