您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Operations Research > 2020 > 5期

Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors

成果类型：

Article

署名作者：

Han, Weidong; Powell, Warren B.

署名单位：

Princeton University

刊物名称：

OPERATIONS RESEARCH

ISSN/ISSBN：

0030-364X

DOI：

10.1287/opre.2019.1921

发表日期：

2020

页码：

1538-1556

关键词：

knowledge-gradient policy global optimization allocation algorithm

摘要：

We consider an optimal learning problem where we are trying to learn a function that is nonlinear in unknown parameters in an online setting. We formulate the problem as a dynamic program, provide the optimality condition using Bellman's equation, and propose a multiperiod lookahead policy to overcome the nonconcavity in the value of information. We adopt a sampled belief model, which we refer to as a discrete prior. For an infinite-horizon problem with discounted cumulative rewards, we prove asymptotic convergence properties under the proposed policy, a rare result for online learning. We then demonstrate the approach in three different settings: a health setting where we make medical decisions to maximize healthcare response over time, a dynamic pricing setting where we make pricing decisions to maximize the cumulative revenue, and a clinical pharmacology setting where we make dosage controls to minimize the deviation between actual and target effects.

来源URL：

访问原文