Newsvendor Problems With Product Unbundling: An Approach Combining Robust Optimization With Deep Reinforcement Learning
成果类型:
Article; Early Access
署名作者:
Yan, Xiaoli; Chen, Youhua (Frank); Yu, Hui; Li, Jiawen
署名单位:
Chongqing University; City University of Hong Kong
刊物名称:
PRODUCTION AND OPERATIONS MANAGEMENT
ISSN/ISSBN:
1059-1478
DOI:
10.1177/10591478251344225
发表日期:
2025
关键词:
Product Unbundling
Newsvendor Problem
distributionally robust optimization
Deep Reinforcement Learning
Robust Learning
摘要:
In fashion, food processing, petrochemical production, and agriculture, products (items) are often bundled in a prefixed assortment, with a given ratio for each product. For example, one case of men's shoes may contain 24 pairs of different sizes of the same design. Of the 24 pairs, there is one size 7 pair, four sizes 9, and so on. Moreover, those pairs of shoes are packaged independently for retailing. Retailers of such products order them in bundles and then resell them unbundled. In this study, we propose and analyze a newsvendor model in which a retailer decides the order quantity of the whole bundle before the uncertain demand for each product/item is realized. We call it a product unbundling newsvendor problem (PUNP): How should the retailer decide the ordering quantity of a product bundle to meet the unknown demands of individual items to maximize its expected profit? We approach this problem with a robust optimization approach that assumes the means and covariance matrix of stochastic demands but not the demand distributions. However, the robust approach that considers the worst-case demand scenario is perceived to be conservative. In this study, we incorporate the distributionally robust optimization with deep reinforcement learning (DRL) and propose a new paradigm of robust learning to improve the robust decision quality. We take this robust solution, that is, the order quantity and profit, as human domain knowledge and implement it into the decision-making process of DRL by designing a policy transfer mechanism. Unsurprisingly, the exact robust solution is computationally intractable; thus, we provide an approximate solution. Simulations were conducted based on limited data sizes, confirming that our approach effectively improves robust performance. Moreover, the hybrid approach significantly outperforms the DRL approach. In the meantime, reduced computing costs and increased interpretability of decision recommendations may facilitate the deployment of DRL algorithms in operational practice. Furthermore, the successful application of the hybrid approach in addressing several variants of the PUNP indicates that the proposed mechanism may provide a pathway for solving complex operational problems.