您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Statistics > 1994 > 3期

2-ARMED DIRICHLET BANDITS WITH DISCOUNTING

成果类型：

Article

署名作者：

CHATTOPADHYAY, MK

刊物名称：

ANNALS OF STATISTICS

ISSN/ISSBN：

0090-5364

DOI：

10.1214/aos/1176325626

发表日期：

1994

页码：

1212-1221

关键词：

摘要：

Sequential selections are to be made from two independent stochastic processes, or ''arms.'' At each stage we choose which arm to observe based on past selections and observations. The observations on arm i are conditionally i.i.d. given their marginal distribution P-i which has a Dirichlet process prior with parameter alpha(i), i = 1, 2. Future observations are discounted: at stage m, the payoff is a(m) times the observation Z(m) at that stage. The discount sequence A(n) = (a(1),a(2),...,a(n),0,0,...) is a nonincreasing sequence of nonnegative numbers, where the ''horizon'' n is finite. The objective is to maximize the total expected payoff E(Sigma(1)(n)a(i)Z(i)). It is shown that optimal strategies continue with an arm when it yields a sufficiently large observation, one larger than a ''break-even observation.'' This generalizes results of Clayton and Berry, who considered two arms with one arm known and assumed a(m) = 1 For All m less than or equal to n.