您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2018 > 1期

Two Time-Scale Stochastic Approximation with Controlled Markov Noise and Off-Policy Temporal-Difference Learning

成果类型：

Article

署名作者：

Karmakar, Prasenjit; Bhatnagar, Shalabh

署名单位：

Indian Institute of Science (IISC) - Bangalore

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.2017.0855

发表日期：

2018

页码：

130-151

关键词：

摘要：

We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by controlled Markov noise. In particular, the faster and slower recursions have nonadditive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal-difference learning with linear function approximation, using our results.