您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Management Science > 2024 > 10期

Weak Signal Asymptotics for Sequentially Randomized Experiments

成果类型：

Article

署名作者：

Kuang, Xu; Wagera, Stefan

署名单位：

Stanford University

刊物名称：

MANAGEMENT SCIENCE

ISSN/ISSBN：

0025-1909

DOI：

10.1287/mnsc.2023.4964

发表日期：

2024

页码：

7024-7041

关键词：

diffusion approximation multiarmed bandit Thompson sampling

摘要：

We use the lens of weak signal asymptotics to study a class of sequentially randomized experiments, including those that arise in solving multiarmed bandit problems. In an experiment with n time steps, we let the mean reward gaps between actions scale to the order 1= ffififfi root n to preserve the difficulty of the learning task as n grows. In this regime, we show that the sample paths of a class of sequentially randomized experiments-adapted to this scaling regime and with arm selection probabilities that vary continuously with state-converge weakly to a diffusion limit, given as the solution to a stochastic differential equation. The diffusion limit enables us to derive refined, instance-specific characterization of stochastic dynamics and to obtain several insights on the regret and belief evolution of a number of sequential experiments including Thompson sampling (but not upper confidence bound, which does not satisfy our continuity assumption). We show that all sequential experiments whose randomization probabilities have a Lipschitz-continuous dependence on the observed data suffer from suboptimal regret performance when the reward gaps are relatively large. Conversely, we find that a version of Thompson sampling with an asymptotically uninformative prior variance achieves near-optimal instance specific regret scaling, including with large reward gaps, but these good regret properties come at the cost of highly unstable posterior beliefs.