Towards accurate differential diagnosis with large language models
成果类型:
Article
署名作者:
McDuff, Daniel; Schaekermann, Mike; Tu, Tao; Palepu, Anil; Wang, Amy; Garrison, Jake; Singhal, Karan; Sharma, Yash; Azizi, Shekoofeh; Kulkarni, Kavita; Hou, Le; Cheng, Yong; Liu, Yun; Mahdavi, S. Sara; Prakash, Sushant; Pathak, Anupam; Semturs, Christopher; Patel, Shwetak; Webster, Dale R.; Dominowska, Ewa; Gottweis, Juraj; Barral, Joelle; Chou, Katherine; Corrado, Greg S.; Matias, Yossi; Sunshine, Jake; Karthikesalingam, Alan; Natarajan, Vivek
署名单位:
Alphabet Inc.; Google Incorporated; Alphabet Inc.; Google Incorporated; Alphabet Inc.; Google Incorporated; Alphabet Inc.; DeepMind; Alphabet Inc.; DeepMind; Alphabet Inc.; Google Incorporated; Alphabet Inc.; DeepMind; Alphabet Inc.; Google Incorporated
刊物名称:
Nature
ISSN/ISSBN:
0028-0983
DOI:
10.1038/s41586-025-08869-4
发表日期:
2025-06-12
关键词:
artificial-intelligence
HEALTH
摘要:
A comprehensive differential diagnosis is a cornerstone of medical care that is often reached through an iterative process of interpretation that combines clinical history, physical examination, investigations and procedures. Interactive interfaces powered by large language models present new opportunities to assist and automate aspects of this process1. Here we introduce the Articulate Medical Intelligence Explorer (AMIE), a large language model that is optimized for diagnostic reasoning, and evaluate its ability to generate a differential diagnosis alone or as an aid to clinicians. Twenty clinicians evaluated 302 challenging, real-world medical cases sourced from published case reports. Each case report was read by two clinicians, who were randomized to one of two assistive conditions: assistance from search engines and standard medical resources; or assistance from AMIE in addition to these tools. All clinicians provided a baseline, unassisted differential diagnosis prior to using the respective assistive tools. AMIE exhibited standalone performance that exceeded that of unassisted clinicians (top-10 accuracy 59.1% versus 33.6%, P = 0.04). Comparing the two assisted study arms, the differential diagnosis quality score was higher for clinicians assisted by AMIE (top-10 accuracy 51.7%) compared with clinicians without its assistance (36.1%; McNemar's test: 45.7, P < 0.01) and clinicians with search (44.4%; McNemar's test: 4.75, P = 0.03). Further, clinicians assisted by AMIE arrived at more comprehensive differential lists than those without assistance from AMIE. Our study suggests that AMIE has potential to improve clinicians' diagnostic reasoning and accuracy in challenging cases, meriting further real-world evaluation for its ability to empower physicians and widen patients' access to specialist-level expertise.
来源URL: