Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

成果类型:
Article
署名作者:
Banerjee, Imon; Honnappa, Harsha; Rao, Vinayak
署名单位:
Northwestern University; Purdue University System; Purdue University; Purdue University System; Purdue University
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2023.0046
发表日期:
2025
页码:
2281-2295
关键词:
location parameters decision-processes density
摘要:
In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an off-line setting with a fixed data set of size m, collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing properties. We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples. We demonstrate the validity of our results under various examples, such as ergodic Markov chains; weakly ergodic inhomogeneous Markov chains; and controlled Markov chains with nonstationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for off-line evaluation of stationary Markov control policies.