Accurate, scalable, and fully automated inference of species trees from raw genome assemblies using ROADIES

成果类型:
Article
署名作者:
Gupta, Anshu; Mirarab, Siavash; Turakhia, Yatish
署名单位:
University of California System; University of California San Diego; University of California System; University of California San Diego
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-10563
DOI:
10.1073/pnas.2500553122
发表日期:
2025-05-13
关键词:
multiple sequence alignment phylogenetic analysis placental mammals gene trees phylogenomics root reconstruction nucleotide radiation insights
摘要:
Current genome sequencing initiatives across a wide range of life forms offer significant potential to enhance our understanding of evolutionary relationships and support transformative biological and medical applications. Species trees play a central role in many of these applications; however, despite the widespread availability of genome assemblies, accurate inference of species trees remains challenging due to the limited automation, substantial domain expertise, and computational resources required by conventional methods. To address this limitation, we present ROADIES, a fully automated pipeline to infer species trees starting from raw genome assemblies. In contrast to the prominent approach, ROADIES incorporates a unique strategy of randomly sampling segments of the input genomes to generate gene trees. This eliminates the need for predefining a set of loci, limiting the analyses to a fixed number of genes, and performing the cumbersome gene annotation and/or whole genome alignment steps. ROADIES also eliminates the need to infer orthology by leveraging existing discordance- aware methods that allow multicopy genes. Using the genomic datasets from large- scale sequencing efforts across four diverse life forms (placental mammals, pomace flies, birds, and budding yeasts), we show that ROADIES infers species trees that are comparable in quality to the state- of- the- art studies but in a fraction of the time and effort, including on challenging datasets with rampant gene tree discordance and complex polyploidy. With its speed, accuracy, and automation, ROADIES has the potential to vastly simplify species tree inference, making it accessible to a broader range of scientists and applications.