Optimality of Quasi-Open-Loop Policies for Discounted Semi-Markov Decision Processes
成果类型:
Article
署名作者:
Adelman, Daniel; Mancini, Angelo J.
署名单位:
University of Chicago
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2015.0775
发表日期:
2016
页码:
1222-1247
关键词:
Optimization
EXISTENCE
algorithm
摘要:
Quasi-open-loop policies consist of sequences of Markovian decision rules that are insensitive to one component of the state space. Given a semi-Markov decision process (SMDP), we distinguish between exogenous and endogenous state components as follows: (i) the decision-maker's actions do not impact the evolution of an exogenous state component, and (ii) between consecutive decision epochs, the exogenous and endogenous state components are conditionally independent given the decision-maker's latest action. For simplicity, we consider an SMDP with one exogenous and one endogenous state component. When transition times between epochs are conditionally independent of the exogenous state given the most recent action, and the exogenous component is a multiplicative compound Poisson process, we provide an almost-everywhere condition on the reward function sufficient for the optimality of a quasi-open-loop policy. After adjusting the discount factor to account for the statistical properties of the exogenous state process, obtaining this policy amounts to solving a reduced SMDP in which the exogenous state is static. Depending on the relationship between the structure of the exogenous state process and the shape of the reward function, we can replace the almost-everywhere condition with one that applies only in expectation. Quasi-open-loop optimality holds even if the times between decision epochs depend on the Poisson process underlying the exogenous state component, and/or the Poisson process is replaced with a generic counting process.