Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information
成果类型:
Article
署名作者:
Cocco, Simona; Posani, Lorenzo; Monasson, Remi
署名单位:
Universite PSL; Ecole Normale Superieure (ENS); Sorbonne Universite; Universite Paris Cite; Sorbonne Universite; Columbia University
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-13036
DOI:
10.1073/pnas.2312335121
发表日期:
2024-06-25
关键词:
coevolutionary landscape
residue contacts
fitness
inference
binding
domain
MODEL
摘要:
Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild -type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational -scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild -type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution. Significance Mutagenesis scans reveal how protein properties are affected by mutations, and it is a challenge for computational biology to predict these effects. State-of-the-art models base their predictions on the statistical patterns shared by homologous sequences, originated from a common ancestor. However, it is unclear how these patterns, reflecting the multitude of structural and functional constraints acting on proteins throughout evolution, are related to the outcome of one mutational experiment. We here propose an integrated approach to select the network of interactions between protein residues controlling the functionality of interest. Our approach is tested on 10 datasets and shows predictive power comparable to deep models, while being interpretable.