Evolutionary-scale prediction of atomic-level protein structure with a language model
成果类型:
Article
署名作者:
Lin, Zeming; Akin, Halil; Rao, Roshan; Hie, Brian; Zhu, Zhongkai; Lu, Wenting; Smetanin, Nikita; Verkuil, Robert; Kabeli, Ori; Shmueli, Yaniv; Costa, Allan dos Santos; Fazel-Zarandi, Maryam; Sercu, Tom; Candido, Salvatore; Rives, Alexander
署名单位:
New York University; Stanford University; Massachusetts Institute of Technology (MIT)
刊物名称:
SCIENCE
ISSN/ISSBN:
0036-11917
DOI:
10.1126/science.ade2574
发表日期:
2023-03-17
页码:
1123-1130
关键词:
contacts
摘要:
Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.