Learning and decision-making with data : optimal formulations and phase transitions
成果类型:
Article; Early Access
署名作者:
Bennouna, Amine; Van Parys, Bart P. G.
署名单位:
Northwestern University; Massachusetts Institute of Technology (MIT); Centrum Wiskunde & Informatica (CWI)
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-025-02259-4
发表日期:
2025
关键词:
ridge-regression
robust optimization
inequalities
rates
摘要:
We study the problem of designing optimal learning and decision-making formulations when only historical data is available. Prior work typically commits to a particular class of data-driven formulation and subsequently tries to establish out-of-sample performance guarantees. Following (Van Parys et al. From data to decisions: Distributionally robust optimization is optimal. Management Science 2020) we take here the opposite approach. We define first a sensible yardstick with which to measure the quality of any data-driven formulation and subsequently seek to find an optimal such formulation. Informally, any data-driven formulation can be seen to balance a measure of proximity of the estimated cost to the actual cost while guaranteeing a level of out-of-sample performance. Given an acceptable level of out-of-sample performance, we construct explicitly a data-driven formulation that is uniformly closer to the true cost than any other formulation enjoying the same out-of-sample performance. We show the existence of three distinct out-of-sample performance regimes; a superexponential regime, an exponential regime, and a subexponential regime. The optimal data-driven formulations can be interpreted as a classically robust formulation in the superexponential regime, an entropic distributionally robust formulation in the exponential regime, and finally a variance penalized formulation in the subexponential regime. This final observation unveils a surprising connection between these three, at first glance seemingly unrelated, data-driven formulations which until now remained hidden.
来源URL: