Towards multimodal foundation models in molecular cell biology

成果类型:
Review
署名作者:
Cui, Haotian; Tejada-Lapuerta, Alejandro; Brbic, Maria; Saez-Rodriguez, Julio; Cristea, Simona; Goodarzi, Hani; Lotfollahi, Mohammad; Theis, Fabian J.; Wang, Bo
署名单位:
University of Toronto; Vector Institute for Artificial Intelligence; University of Toronto; University Health Network Toronto; Peter Munk Cardiac Centre; Helmholtz Association; Helmholtz-Center Munich - German Research Center for Environmental Health; Technical University of Munich; Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne; Swiss Federal Institutes of Technology Domain; Ecole Polytechnique Federale de Lausanne; Swiss School of Public Health (SSPH+); Swiss Institute of Bioinformatics; Ruprecht Karls University Heidelberg; Harvard University; Harvard University Medical Affiliates; Dana-Farber Cancer Institute; Harvard University; Harvard T.H. Chan School of Public Health; University of California System; University of California San Francisco; Wellcome Trust Sanger Institute; University of Cambridge; Technical University of Munich; University of Toronto
刊物名称:
Nature
ISSN/ISSBN:
0028-3207
DOI:
10.1038/s41586-025-08710-y
发表日期:
2025-04-17
页码:
623-633
关键词:
atlas rna chromatin principles database
摘要:
The rapid advent of high-throughput omics technologies has created an exponential growth in biological data, often outpacing our ability to derive molecular insights. Large-language models have shown a way out of this data deluge in natural language processing by integrating massive datasets into a joint model with manifold downstream use cases. Here we envision developing multimodal foundation models, pretrained on diverse omics datasets, including genomics, transcriptomics, epigenomics, proteomics, metabolomics and spatial profiling. These models are expected to exhibit unprecedented potential for characterizing the molecular states of cells across a broad continuum, thereby facilitating the creation of holistic maps of cells, genes and tissues. Context-specific transfer learning of the foundation models can empower diverse applications from novel cell-type recognition, biomarker discovery and gene regulation inference, to in silico perturbations. This new paradigm could launch an era of artificial intelligence-empowered analyses, one that promises to unravel the intricate complexities of molecular cell biology, to support experimental design and, more broadly, to profoundly extend our understanding of life sciences.