您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 综合性期刊 > Proceedings of the National Academy of Sciences of the United States of America > 2025 > 24期

Human-AI collectives most accurately diagnose clinical vignettes

成果类型：

Article

署名作者：

Zoeller, Nikolas; Berger, Julian; Lin, Irving; Fu, Nathan; Komarneni, Jayanth; Barabucci, Gioele; Laskowski, Kyle; Shia, Victor; Harack, Benjamin; Chu, Eugene A.; Trianni, Vito; Kurvers, Ralf H. J. M.; Herzog, Stefan M.

署名单位：

Max Planck Society; University of Cologne; Claremont Colleges; Harvey Mudd College; University of Oxford; Kaiser Permanente; Consiglio Nazionale delle Ricerche (CNR); Istituto di Scienze e Tecnologie della Cognizione (ISTC-CNR); Technical University of Berlin

刊物名称：

PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA

ISSN/ISSBN：

0027-12449

DOI：

10.1073/pnas.2426153122

发表日期：

2025-06-13

关键词：

hospitalized-patients intelligence performance models SYSTEM

摘要：

AI systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased-shortcomings that may reflect LLMs' inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here, we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to openended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 text-based medical case vignettes. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience and can be attributed to humans' and LLMs' complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.