Unsupervised evolution of protein and antibody complexes with a structure-informed language model

成果类型:
Article
署名作者:
Shanker, Varun R.; Bruun, Theodora U. J.; Hie, Brian L.; Kim, Peter S.
署名单位:
Stanford University; Stanford University; Stanford University; Stanford University; Chan Zuckerberg Initiative (CZI); Stanford University; Stanford University
刊物名称:
SCIENCE
ISSN/ISSBN:
0036-10394
DOI:
10.1126/science.adk8946
发表日期:
2024-07-05
页码:
46-53
关键词:
fitness landscapes sequence DESIGN selection RECOGNITION inhibition generation reveals set
摘要:
Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures of proteins determine their specific function, activity, and evolvability. Here, we show that a general protein language model augmented with protein structure backbone coordinates can guide evolution for diverse proteins without the need to model individual functional tasks. We also demonstrate that ESM-IF1, which was only trained on single-chain structures, can be extended to engineer protein complexes. Using this approach, we screened about 30 variants of two therapeutic clinical antibodies used to treat severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. We achieved up to 25-fold improvement in neutralization and 37-fold improvement in affinity against antibody-escaped viral variants of concern BQ.1.1 and XBB.1.5, respectively. These findings highlight the advantage of integrating structural information to identify efficient protein evolution trajectories without requiring any task-specific training data.