Thompson Sampling for the Multinomial Logit Bandit
成果类型:
Article; Early Access
署名作者:
Agrawal, Shipra; Avadhanula, Vashist; Goyal, Vineet; Zeevi, Assaf
署名单位:
Columbia University; Columbia University
刊物名称:
MATHEMATICS OF OPERATIONS RESEARCH
ISSN/ISSBN:
0364-765X
DOI:
10.1287/moor.2020.0096
发表日期:
2025
关键词:
Optimization
摘要:
We consider a dynamic combinatorial optimization problem where at each time step, the decision maker selects a subset of cardinality K from N possible items and observes a feedback in the form of the index of one of the items in the said subset or none. Each of the N items is ascribed a certain value (reward), which is collected if the item is chosen. This problem is motivated by that of assortment selection in online retail, where items are products. Akin to that literature, it is assumed that the choice of the item given the subset is governed by a multinomial logit (MNL) choice model whose parameters are a priori unknown. The objective of the decision maker is to maximize the expected cumulative rewards over a finite horizon T or alternatively, minimize the regret relative to an oracle that knows the MNL choice model parameters. We formulate this problem as a multiarmed bandit problem that we refer to as the MNL-bandit problem. We present a Thompson sampling-based algorithm for this problem and show that it achieves near-optimal regret as well as attractive empirical performance.
来源URL: