Fair Exploration via Axiomatic Bargaining

成果类型:
Article
署名作者:
Baek, Jackie; Farias, Vivek F.
署名单位:
New York University; Massachusetts Institute of Technology (MIT)
刊物名称:
MANAGEMENT SCIENCE
ISSN/ISSBN:
0025-1909
DOI:
10.1287/mnsc.2022.01985
发表日期:
2024
关键词:
bandits fairness exploration Nash bargaining solution
摘要:
Exploration is often necessary in online learning to maximize long-term rewards, but it comes at the cost of short-term regret. We study how this cost of exploration is shared across multiple groups. For example, in a clinical trial setting, patients who are assigned a suboptimal treatment effectively incur the cost of exploration. When patients are associated with natural groups on the basis of, say, race or age, it is natural to ask whether the cost of exploration borne by any single group is fair. So motivated, we introduce the grouped bandit model. We leverage the theory of axiomatic bargaining, and the Nash bargaining solution in particular, to formalize what might constitute a fair division of the cost of exploration across groups. On one hand, we show that any regretoptimal policy strikingly results in the least fair outcome: such policies will perversely leverage the most disadvantaged groups when they can. More constructively, we derive policies that are optimally fair and simultaneously enjoy a small price of fairness. We illustrate the relative merits of our algorithmic framework with a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.