您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 概率 > The Annals of Applied Probability > 2007 > 1期

Reading policies for joins: An asymptotic analysis

成果类型：

Article

署名作者：

Russo, Ralph P.; Shyamalkumar, Nariankadu D.

署名单位：

University of Iowa

刊物名称：

ANNALS OF APPLIED PROBABILITY

ISSN/ISSBN：

1050-5164

DOI：

10.1214/105051606000000646

发表日期：

2007

页码：

230-264

关键词：

Bandit problem

摘要：

Suppose that mn observations are made from the distribution R and n - m(n) from the distribution S. Associate with each pair, x from R and y from S, a nonnegative score phi(x, y). An optimal reading policy is one that yields a sequence mn that maximizes E(M(n)), the expected sum of the (n - mn)mn observed scores, uniformly in n. The alternating policy, which switches between the two sources, is the optimal nonadaptive policy. In contrast, the greedy policy, which chooses its source to maximize the expected gain on the next step, is shown to be the optimal policy. Asymptotics are provided for the case where the R and S distributions are discrete and phi(x, y) = 1 or 0 according as x = y or not (i.e., the observations match). Specifically, an invariance result is proved which guarantees that for a wide class of policies, including the alternating and the greedy, the variable M(n) obeys the same CLT and LIL. A more delicate analysis of the sequence E(M(n)) and the sample paths of M(n), for both alternating and greedy, reveals the slender sense in which the latter policy is asymptotically superior to the former, as well as a sense of equivalence of the two and robustness of the former.