A Machine Learning Framework for Assessing Experts' Decision Quality

成果类型:
Article
署名作者:
Dong, Wanxue; Saar-Tsechansky, Maytal; Geva, Tomer
署名单位:
Chinese University of Hong Kong; University of Texas System; University of Texas Austin; Tel Aviv University
刊物名称:
MANAGEMENT SCIENCE
ISSN/ISSBN:
0025-1909
DOI:
10.1287/mnsc.2021.03357
发表日期:
2025
页码:
5696-5721
关键词:
Machine learning worker evaluation decision accuracy information systems
摘要:
Expert workers make non-trivial decisions with significant implications. Experts' decision accuracy is, thus, a fundamental aspect of their judgment quality, key to both management and consumers of experts' services. Yet, in many important settings, transparency in experts' decision quality is rarely possible because ground truth data for evaluating the experts' decisions is costly and available only for a limited set of decisions. Furthermore, different experts typically handle exclusive sets of decisions, and thus, prior solutions that rely on the aggregation of multiple experts' decisions for the same instance are inapplicable. We first formulate the problem of estimating experts' decision accuracy in this setting and then develop a machine-learning-based framework to address it. Our method effectively leverages both abundant historical data on workers' past decisions and scarce decision instances with ground truth labels. Using both semi-synthetic data based on publicly available data sets and purposefully compiled data sets on real workers' decisions, we conduct extensive empirical evaluations of our method's performance relative to alternatives. The results show that our approach is superior to existing alternatives across diverse settings, including settings that involve different data domains, experts' qualities, and amounts of ground truth data. To our knowledge, this paper is the first to posit and address the problem of estimating experts' decision accuracies from historical data with scarce ground truth, and it is the first to offer comprehensive results for this problem setting, establishing the performances that can be achieved across settings as well as the stateof-the-art performance on which future work can build.