Approximate Constrained Discounted Dynamic Programming With Uniform Feasibility and Optimality

成果类型:
Article
署名作者:
Chang, Hyeong Soo
署名单位:
Sogang University
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2024.3523847
发表日期:
2025
页码:
4031-4036
关键词:
costs Scheduling programming probability distribution dynamic programming Approximation algorithms trajectory Throughput Scheduling algorithms reviews Dynamic programming (DP) Markov Decision Process optimality equation uniform-optimality
摘要:
An important question about finite constrained Markov decision process (CMDP) problem is if there exists a condition under which a uniformly optimal and uniformly feasible policy exists in the set of deterministic, history-independent, and stationary policies that achieves the optimal value at all initial states and if the CMDP problem with the condition can be solved by dynamic programming (DP). This is because the crux of the unconstrained MDP theory developed by Bellman lies in the answer to the same existence question of such an optimal policy to MDP. Even if the topic of CMDP has been studied over the years, there has not been any relevant responsive work since the open question was raised about three decades ago in the literature. We establish (as some answer to this question) that any finite CMDP problem M (c) contains inherently a DP-structure in its subordinate CMDP problem )sic)(c ) induced from the parameters of M (c) and (sic)(c )is DP-solvable. We drive a policy-iteration-type algorithm for solving (sic)(c ) providing an approximate solution to M (c) or M (c) with a fixed initial state.