您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2017 > 2期

Long-Term Values in Markov Decision Processes and Repeated Games, and a New Distance for Probability Spaces

成果类型：

Article

署名作者：

Renault, Jerome; Venel, Xavier

署名单位：

Universite de Toulouse; Universite Toulouse 1 Capitole; Toulouse School of Economics; Paris School of Economics

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2016.0814

发表日期：

2017

页码：

349-376

关键词：

Incomplete information controller EXISTENCE THEOREM chains PROOF

摘要：

We study long-term Markov Decision Processes and Gambling Houses, with applications to any partial observation MDPs with finitely many states and zero-sum repeated games with an informed controller. We consider a decision-maker which is maximizing the weighted sum Sigma(t >= 1) theta(t)r(t), where rt is the expected reward of the t-th stage. We prove the existence of a very strong notion of long-term value called general uniform value, representing the fact that the decision-maker can play well independently of the evaluations (theta(t))(t >= 1) over stages, provided the total variation (or impatience) Sigma(t >= 1) |theta(t+1) - theta(t)| is small enough. This result generalizes previous results of Rosenberg, Solan and Vieille [35] and Renault [31] that focus on arithmetic means and discounted evaluations. Moreover, we give a variational characterization of the general uniform value via the introduction of appropriate invariant measures for the decision problems, generalizing the fundamental theorem of gambling or the Aumann-Maschler cavu formula for repeated games with incomplete information. Apart the introduction of appropriate invariant measures, the main innovation in our proofs is the introduction of a new metric, d(star,) such that partial observation MDP's and repeated games with an informed controller may be associated to auxiliary problems that are non-expansive with respect to d(star). Given two Borel probabilities over a compact subset X of a normed vector space, we define d(star)(u, v) = sup (f is an element of D1) |u(f) - v(f)|, where D-1 is the set of functions satisfying: for all x, y is an element of X, for all a, b >= 0, af (x) - bf (y) <= parallel to ax - by parallel to. The particular case where X is a simplex endowed with the L1- norm is particularly interesting: d(star) is the largest distance over the probabilities with finite support over X which makes every disintegration non-expansive. Moreover, we obtain a Kantorovich-Rubinstein type duality formula for d(star)(u, v) involving couples of measures (alpha, beta) over X x X such that the first marginal of alpha is u and the second marginal of beta is v.

来源URL：

访问原文