TOWARD THEORETICAL UNDERSTANDINGS OF ROBUST MARKOV DECISION PROCESSES: SAMPLE COMPLEXITY AND ASYMPTOTICS

成果类型:
Article
署名作者:
Yang, Wenhao; Zhang, Liangyu; Zhang, Zhihua
署名单位:
Peking University; Peking University
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/22-AOS2225
发表日期:
2022
页码:
3223-3248
关键词:
Policy evaluation optimization bounds
摘要:
In this paper, we study the nonasymptotic and asymptotic performances of the optimal robust policy and value function of robust Markov Decision Processes (MDPs), where the optimal robust policy and value function are estimated from a generative model. While prior work focusing on nonasymptotic performances of robust MDPs is restricted in the setting of the KL uncertainty set and (s, a)-rectangular assumption, we improve their results and also consider other uncertainty sets, including the L-1 and chi(2) balls. Our results show that when we assume (s, a)-rectangular on uncertainty sets, the sample complexity is about (O) over tilde(|S|(2)|A|/epsilon(2)rho(2)(1-gamma)(4)). In addition, we extend our results from the (s, a)-rectangular assumption to the s-rectangular assumption. In this scenario, the sample complexity varies with the choice of uncertainty sets and is generally larger than the case under the (s, a)-rectangular assumption. Moreover, we also show that the optimal robust value function is asymptotically normal with a typical rate root n under the (s, a) and s-rectangular assumptions from both theoretical and empirical perspectives.