Functional and evolutionary significance of unknown genes from uncultivated taxa

成果类型:
Article
署名作者:
Rodriguez del Rio, Alvaro; Giner-Lamia, Joaquin; Cantalapiedra, Carlos P.; Botas, Jorge; Deng, Ziqi; Hernandez-Plaza, Ana; Munar-Palmer, Marti; Santamaria-Hernando, Saray; Rodriguez-Herva, Jose J.; Ruscheweyh, Hans-Joachim; Paoli, Lucas; Schmidt, Thomas S. B.; Sunagawa, Shinichi; Bork, Peer; Lopez-Solanilla, Emilia; Coelho, Luis Pedro; Huerta-Cepas, Jaime
署名单位:
Universidad Politecnica de Madrid; Universidad Politecnica de Madrid; Swiss Federal Institutes of Technology Domain; ETH Zurich; Swiss Federal Institutes of Technology Domain; ETH Zurich; Swiss Institute of Bioinformatics; European Molecular Biology Laboratory (EMBL); Helmholtz Association; Max Delbruck Center for Molecular Medicine; University of Wurzburg; Fudan University; Consejo Superior de Investigaciones Cientificas (CSIC); University of Sevilla; CSIC - Instituto de Bioquimica Vegetal y Fotosintesis (IBVF); University of Queensland; Queensland University of Technology (QUT)
刊物名称:
Nature
ISSN/ISSBN:
0028-5729
DOI:
10.1038/s41586-023-06955-z
发表日期:
2024-02-08
关键词:
protein-structure bacterial database microbiome alignment biology cohort
摘要:
Many of the Earth's microbes remain uncultured and understudied, limiting our understanding of the functional and evolutionary aspects of their genetic material, which remain largely overlooked in most metagenomic studies1. Here we analysed 149,842 environmental genomes from multiple habitats2-6 and compiled a curated catalogue of 404,085 functionally and evolutionarily significant novel (FESNov) gene families exclusive to uncultivated prokaryotic taxa. All FESNov families span multiple species, exhibit strong signals of purifying selection and qualify as new orthologous groups, thus nearly tripling the number of bacterial and archaeal gene families described to date. The FESNov catalogue is enriched in clade-specific traits, including 1,034 novel families that can distinguish entire uncultivated phyla, classes and orders, probably representing synapomorphies that facilitated their evolutionary divergence. Using genomic context analysis and structural alignments we predicted functional associations for 32.4% of FESNov families, including 4,349 high-confidence associations with important biological processes. These predictions provide a valuable hypothesis-driven framework that we used for experimental validatation of a new gene family involved in cell motility and a novel set of antimicrobial peptides. We also demonstrate that the relative abundance profiles of novel families can discriminate between environments and clinical conditions, leading to the discovery of potentially new biomarkers associated with colorectal cancer. We expect this work to enhance future metagenomics studies and expand our knowledge of the genetic repertory of uncultivated organisms. We analysed 149,842 environmental genomes from multiple habitats and compiled a curated catalogue of 404,085 functionally and evolutionarily significant novel gene families exclusive to uncultivated prokaryotic taxa spanning multiple species.