Learning to Schedule in Multiclass Many-Server Queues with Abandonment

成果类型:
Article; Early Access
署名作者:
Zhong, Yueyang; Birge, John R.; Ward, Amy R.
署名单位:
University of Chicago
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2022.0197
发表日期:
2024
关键词:
algorithms policies single SYSTEM RULE
摘要:
The multiclass many-server queue with abandonment (specifically, the multiclass GI/GI/N+GI queue) is a canonical model for service systems. One key operational question is how to schedule; that is, how to choose the customer that a newly available server will serve. The scheduling question is of fundamental importance because scheduling determines which customer classes have more abandonments. However, even though there is much work on scheduling in queueing systems, there is comparatively less work on scheduling in queueing systems when the model primitives (that is, distributional and parameter information regarding the interarrival, service, and patience times) are unknown and may be learned. Our objective is to determine a scheduling policy that minimizes regret, which is the difference in expected abandonment cost between a proposed policy, which does not have knowledge of model primitives, and a benchmark policy, which has full knowledge of model primitives. The difficulty is that the state space is very complex, because in order for the system to be Markovian, we must track (i) the time elapsed since the last arrival for each class, (ii) the amount of time each customer in service has been in service, and (iii) the amount of time each customer in queue has spent waiting. We propose a policy that uses a Learn-then-Schedule (LTS) algorithm that we develop. We analyze the performance of the LTS policy against the benchmark ay-rule (that prioritizes classes for service in accordance with their cost of abandonment times service rate). The algorithm is composed of a learning phase, during which empirical estimates of the service and abandonment rates and their associated abandonment costs are formed, and an exploitation phase, during which an empirical ay-rule based on those estimates is applied. We show that the LTS policy has regret of order logT (where T is the system run-time), which is the smallest order achievable.