Structural Estimation of Partially Observable Markov Decision Processes
成果类型:
Article
署名作者:
Chang, Yanling; Garcia, Alfredo; Wang, Zhide; Sun, Lu
署名单位:
Texas A&M University System; Texas A&M University College Station; Texas A&M University System; Texas A&M University College Station
刊物名称:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
ISSN/ISSBN:
0018-9286
DOI:
10.1109/TAC.2022.3217908
发表日期:
2023
页码:
5135-5141
关键词:
Dynamic Programming
Maximum likelihood estimation
observability
摘要:
Partially observable Markov decision processes (POMDPs) is a well-developed framework for sequential decision-making under uncertainty and partial information. This article considers the (inverse) structural estimation of the primitives of a POMDP based upon data in the form of sequences of observables and implemented actions. We analyze the structural properties of an entropy regularized POMDP and specify conditions under which the model is identifiable without knowledge of the state dynamics. We consider a soft policy gradient algorithm to compute a maximum likelihood estimator, and illustrate the approach with an equipment replacement problem.