Learning to estimate sample-specific transcriptional networks for 7,000 tumors
成果类型:
Article
署名作者:
Ellington, Caleb N.; Lengerich, Benjamin J.; Watkins, Thomas B. K.; Yang, Jiekun; Adduri, Abhinav K.; Mahbub, Sazan; Xiao, Hanxi; Kellis, Manolis; Xing, Eric P.
署名单位:
Carnegie Mellon University; Massachusetts Institute of Technology (MIT); Harvard University; Massachusetts Institute of Technology (MIT); Broad Institute; University of London; University College London; Pennsylvania Commonwealth System of Higher Education (PCSHE); University of Pittsburgh; Mohamed bin Zayed University of Artificial Intelligence MBZUAI
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-11725
DOI:
10.1073/pnas.2411930122
发表日期:
2025-05-23
关键词:
comprehensive genomic characterization
molecular classification
cancer
selection
subsets
摘要:
Cancers are shaped by somatic mutations, microenvironment, and patient background, each altering gene expression and regulation in complex ways, resulting in heterogeneous cellular states and dynamics. Inferring gene regulatory networks (GRNs) from expression data can help characterize this regulation-driven heterogeneity, but network inference requires many statistical samples, limiting GRNs to cluster-level analyses that ignore intracluster heterogeneity. We propose to move beyond coarse analyses of predefined subgroups by using contextualized learning, a multitask learning paradigm that uses multiview contexts including phenotypic, molecular, and environmental information to infer personalized models. With sample-specific contexts, contextualization enables sample-specific models and even generalizes at test time to predict network models for entirely unseen contexts. We unify three network model classes (Correlation, Markov, and Neighborhood Selection) and estimate context-specific GRNs for 7,997 tumors across 25 tumor types, using copy number and driver mutation profiles, tumor microenvironment, and patient demographics as model context. Our generative modeling approach allows us to predict GRNs for unseen tumor types based on a pan-cancer model of how somatic mutations affect gene regulation. Finally, contextualized networks enable GRN-based precision oncology by providing a structured view of expression dynamics at sample-specific resolution, explaining known biomarkers in terms of network-mediated effects and leading to subtypings that improve survival prognosis. We provide a SKLearn-style Python package https://contextualized.ml for learning and analyzing contextualized models, as well as interactive plotting tools for pan-cancer data exploration at https://github.com/cnellington/CancerContextualized.