Sequence modeling and design from molecular to genome scale with Evo

成果类型:
Article
署名作者:
Nguyen, Eric; Poli, Michael; Durrant, Matthew G.; Kang, Brian; Katrekar, Dhruva; Li, David B.; Bartie, Liam J.; Thomas, Armin W.; King, Samuel H.; Brixi, Garyk; Sullivan, Jeremy; Ng, Madelena Y.; Lewis, Ashley; Lou, Aaron; Ermon, Stefano; Baccus, Stephen A.; Hernandez-Boussard, Tina; Re, Christopher; Hsu, Patrick D.; Hie, Brian L.
署名单位:
Stanford University; Stanford University; Stanford University; Stanford University; Stanford University; Stanford University; University of California System; University of California Berkeley; University of California System; University of California Berkeley; Stanford University
刊物名称:
SCIENCE
ISSN/ISSBN:
0036-9763
DOI:
10.1126/science.ado9336
发表日期:
2024-11-15
关键词:
protein bacterial identification determinants EVOLUTION expansion database SYSTEM IMPACT
摘要:
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale sequences with plausible genomic architecture. These prediction and generation capabilities span molecular to genomic scales of complexity, advancing our understanding and control of biology.