您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 信息管理与信息系统 > Information Systems Research > 2025

Quality Control for Crowd Workers and for Language Models: A Framework for Free-Text Response Evaluation with No Ground Truth

成果类型：

Article; Early Access

署名作者：

Yahav, Inbal; Goldstein, Anat; Geva, Tomer; Meir, Shahar; Shehory, Onn

署名单位：

Tel Aviv University; Ariel University; Bar Ilan University

刊物名称：

INFORMATION SYSTEMS RESEARCH

ISSN/ISSBN：

1047-7047

DOI：

10.1287/isre.2023.0426

发表日期：

2025

关键词：

摘要：

In recent years, the field of natural language processing has made remarkable progress with the emergence of large language models (LLMs). In particular, the ability of LLMs to provide fact-based, free-text responses to user queries has the potential to revolutionize domains such as online search and the use of informative chatbots. However, extensive validation is required so that the response accuracy of question-answering LLMs can be confidently trusted. This paper introduces a framework to address this challenge: automated quality evaluation based on textual responses (AQER). The AQER framework focuses on two primary tasks: evaluating the quality of individual workers based on their free-text responses given that no ground-truth data are available and assessing the quality of LLM responses given a set of worker-generated responses. AQER is advantageously intuitive, easy to implement, and flexible to accommodate different components. To evaluate AQER's effectiveness, we conducted empirical evaluations using semi-synthetic and real-world question-and-answer data sets as well as stress testing through numerical simulations. We also provide analytical motivation and show method convergence and boundary conditions using the probably approximately correct learning framework. The results demonstrate AQER's robustness in evaluating LLMs and workers, and its superiority over baseline approaches. These findings establish AQER as a benchmark for future research in this field.

来源URL：

访问原文