Machine learning in biological physics: From biomolecular prediction to design

成果类型:
Article
署名作者:
Martin, Jonathan; Mateos, Marcos Lequerica; Onuchic, Jose N.; Coluzza, Ivan; Morcos, Faruck
署名单位:
University of Texas System; University of Texas Dallas; BCMaterials; Rice University; Rice University; Rice University; Rice University; Basque Foundation for Science; University of Texas System; University of Texas Dallas
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-13690
DOI:
10.1073/pnas.2311807121
发表日期:
2024-07-02
关键词:
protein-structure prediction direct-coupling analysis associative memory sequence-space models landscapes dependence inference contacts systems
摘要:
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is learnable and propose its future use a target structure.