Unsupervised pattern identification in spatial gene expression atlas reveals mouse brain regions beyond established ontology

成果类型:
Article
署名作者:
Cahill, Robert; Wang, Yu; Xian, R. Patrick; Lee, Alex J.; Zeng, Hongkui; Yu, Bin; Tasic, Bosiljka; Abbasi, Reza
署名单位:
University of California System; University of California San Francisco; University of California System; University of California San Francisco; University of California System; University of California Berkeley; Allen Institute for Brain Science; University of California System; University of California Berkeley; University of California System; University of California San Francisco
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-9738
DOI:
10.1073/pnas.2319804121
发表日期:
2024-09-10
关键词:
nonnegative matrix factorization transcriptomic cell-types resolved transcriptomics signal CLASSIFICATION neuroscience connectome
摘要:
The rapid growth of large- scale spatial gene expression data demands efficient and reliable computational tools to extract major trends of gene expression in their native spatial context. Here, we used stability- driven unsupervised learning (i.e., staNMF) to identify principal patterns (PPs) of 3D gene expression profiles and understand spatial gene distribution and anatomical localization at the whole mouse brain level. Our subsequent spatial correlation analysis systematically compared the PPs to known anatomical regions and ontology from the Allen Mouse Brain Atlas using spatial neighborhoods. We demonstrate that our stable and spatially coherent PPs, whose linear combinations accurately approximate the spatial gene data, are highly correlated with combinations of expert- annotated brain regions. These PPs yield a brain ontology based purely on spatial gene expression. Our PP identification approach outperforms principal component analysis and typical clustering algorithms on the same task. Moreover, we show that the stable PPs reveal marked regional imbalance of brainwide genetic architecture, leading to region- specific marker genes and gene coexpression networks. Our findings highlight the advantages of stability- driven machine learning for plausible biological discovery from dense spatial gene expression data, streamlining tasks that are infeasible by conventional manual approaches.