作者:El Housni, Omar; Topaloglu, Huseyin
摘要:We consider a joint assortment optimization and customization problem under a mixture of multinomial logit models. In this problem, a firm faces customers of different types, each making a choice within an offered assortment according to the multinomial logit model with different parameters. The problem takes place in two stages. In the first stage, the firm picks an assortment of products to carry the subject to a cardinality constraint. In the second stage, a customer of a certain type arriv...
作者:Gu, Haotian; Guo, Xin; Wei, Xiaoli; Xu, Renyuan
作者单位:University of California System; University of California Berkeley; University of California System; University of California Berkeley; Tsinghua Shenzhen International Graduate School; University of Southern California
摘要:The dynamic programming principle (DPP) is fundamental for control and optimization, including Markov decision problems (MDPs), reinforcement learning (RL), and, more recently, mean-field controls (MFCs). However, in the learning framework of MFCs, the DPP has not been rigorously established, despite its critical importance for algorithm designs. In this paper, we first present a simple example in MFCs with learning where the DPP fails with a misspecified Q function and then propose the correc...