您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 管理科学与工程 > Mathematics of Operations Research > 2002 > 2期

Q-learning for risk-sensitive control

成果类型：

Article

署名作者：

Borkar, VS

署名单位：

Tata Institute of Fundamental Research (TIFR)

刊物名称：

MATHEMATICS OF OPERATIONS RESEARCH

ISSN/ISSBN：

0364-765X

DOI：

10.1287/moor.27.2.294.324

发表日期：

2002

页码：

294-311

关键词：

stochastic-approximation DISCRETE-TIME MARKOV-PROCESSES CONVERGENCE algorithms

摘要：

We propose for risk-sensitive control of finite Markov chains a counterpart of the popular Q-learning algorithm for classical Markov decision processes. The algorithm is shown to converge with probability one to the desired solution. The proof technique is an adaptation of the o.d.e. approach for the analysis of stochastic approximation algorithms, with most of die work involved used for the analysis of the specific o.d.e.s. that arise.