A multimodal generative AI copilot for human pathology
成果类型:
Article
署名作者:
Lu, Ming Y.; Chen, Bowen; Williamson, Drew F. K.; Chen, Richard J.; Zhao, Melissa; Chow, Aaron K.; Ikemura, Kenji; Kim, Ahrong; Pouli, Dimitra; Patel, Ankush; Soliman, Amr; Chen, Chengkuan; Ding, Tong; Wang, Judy J.; Gerber, Georg; Liang, Ivy; Le, Long Phi; Parwani, Anil V.; Weishaupt, Luca L.; Mahmood, Faisal
署名单位:
Harvard University; Harvard University Medical Affiliates; Brigham & Women's Hospital; Harvard Medical School; Harvard University; Harvard Medical School; Harvard University Medical Affiliates; Massachusetts General Hospital; Harvard University; Massachusetts Institute of Technology (MIT); Broad Institute; Massachusetts Institute of Technology (MIT); University System of Ohio; Ohio State University; Pusan National University; Mayo Clinic; Harvard University; Massachusetts Institute of Technology (MIT); Harvard University; Harvard University
刊物名称:
Nature
ISSN/ISSBN:
0028-6481
DOI:
10.1038/s41586-024-07618-3
发表日期:
2024-10-10
关键词:
artificial-intelligence
foundation models
摘要:
Computational pathology1,2 has witnessed considerable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders3,4. However, despite the explosive growth of generative artificial intelligence (AI), there have been few studies on building general-purpose multimodal AI assistants and copilots5 tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We built PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and fine-tuning the whole system on over 456,000 diverse visual-language instructions consisting of 999,202 question and answer turns. We compare PathChat with several multimodal vision-language AI assistants and GPT-4V, which powers the commercially available multimodal general-purpose AI assistant ChatGPT-4 (ref. 6). PathChat achieved state-of-the-art performance on multiple-choice diagnostic questions from cases with diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive vision-language AI copilot that can flexibly handle both visual and natural language inputs, PathChat may potentially find impactful applications in pathology education, research and human-in-the-loop clinical decision-making. PathChat, a multimodal generative AI copilot for human pathology, has been trained on a large dataset of visual-language instructions to interactively assist users with diverse pathology tasks.