Organismal complexity strongly correlates with the number of protein families and domains
成果类型:
Article
署名作者:
Ponce, David; Krishnamurthy, Subramanian
署名单位:
Nevada System of Higher Education (NSHE); University of Nevada Reno; Rutgers University System; Rutgers University New Brunswick; Rutgers University Biomedical & Health Sciences; Rutgers Cancer Institute of New Jersey
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-12490
DOI:
10.1073/pnas.2404332122
发表日期:
2025-01-27
关键词:
duplicate genes
genome
EVOLUTION
expansion
Robustness
diversity
sequence
database
regions
size
摘要:
In the pregenomic era, scientists were puzzled by the observation that haploid genome size (the C- value) did not correlate well with organismal complexity. This phenomenon, called the C- value paradox, is mostly explained by the fact that protein- coding genes occupy only a small fraction of eukaryotic genomes. When the first genome sequences became available, scientists were even more surprised by the fact that the number of genes (G- value) was also a poor predictor of complexity, which gave rise to the G- value paradox. The proposed explanations usually invoke mechanisms that increase the information content of each individual gene (protein-protein interactions, intrinsic disorder, posttranslational modifications, alternative splicing, etc.). Less attention has been paid to mechanisms that increase the amount of genetic material but do not increase (or not to the same extent) the amount of information encoded in the genome, such as gene duplication and domain shuffling. Proteins belonging to the same family and/or sharing the same domains often carry out similar or even redundant functions. We thus hypothesized that an organism's number of different protein families and domains should be suitable predictors of organismal complexity. In agreement with our hypothesis, we observed that the number of protein families, clans, domains, and motifs increases from simple to progressively more complex organisms. In addition, these metrics correlate with the number of cell types better than and independently of the number of protein- coding genes and several previously proposed predictors of organismal complexity. Our observations have the potential to represent a resolution to the G- value paradox.