Adaptive Discretization in Online Reinforcement Learning

成果类型:
Article
署名作者:
Sinclair, Sean R.; Banerjee, Siddhartha; Yu, Christina Lee
署名单位:
Cornell University
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2022.2396
发表日期:
2023
页码:
1636-1652
关键词:
bandits bounds
摘要:
Discretization-based approaches to solving online reinforcement learning problems are studied extensively on applications such as resource allocation and cache management. The two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. There are several experimental results investigating heuristic approaches to these questions but little theoretical treatment. In this paper, we provide a unified theoretical analysis of model-free and model-based, tree-based adaptive hierarchical partitioning methods for online reinforcement learning. We show how our algorithms take advantage of inherent problem structure by providing guarantees that scale with respect to the zooming instead of the ambient dimension, an instance dependent quantity measuring the benignness of the optimal Q?h function. Many applications in computing systems and operations research require algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden for policy evaluation and training. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets.