Online Learning with Sample Selection Bias
成果类型:
Article
署名作者:
Singhvi, Divya; Singhvi, Somya
署名单位:
New York University; University of Southern California
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.2023.0223
发表日期:
2025
关键词:
bandits
Donations
PERSONALIZATION
Sample-selection bias
crowd-funding platforms
operations for social-good
摘要:
We consider the problem of personalized recommendations on online platforms, where user preferences are unknown, and users interact with the platform through a series of sequential decisions (such as clicking to watch on video platforms or clicking to donate on donation platforms). The platform aims to maximize the final outcome (e.g., viewing duration on video platforms or donations on donation platforms). However, the platform only observes the final outcome for users who complete the first stage (clicking on the recommendation). The final outcome for users who do not complete the first stage (not clicking on the recommendation) remains unobserved (also referred to as funneling). This censoring of outcomes creates a selection bias issue, as the observed outcomes at different stages are often correlated. We demonstrate that failing to account for this selection bias results in biased estimates and suboptimal recommendations. In fact, well-performing personalized learning algorithms perform poorly and incur linear regret in this setting. Therefore, we propose the sample selection bandit (SSB) algorithm, which combines Heckman's two-step estimator with the optimism under uncertainty principle to address the sample selection bias issue. We show that the SSB algorithm achieves a rate-optimal regret root ffiffiffi rate (up to logarithmic terms) of O( T ). Furthermore, we conduct extensive numerical experiments on both synthetic data and real donation data collected from GoFundMe (a crowdfunding platform), demonstrating significant improvements over benchmark stateof-the-art learning algorithms in this setting.
来源URL: