BAYESIAN NONPARAMETRIC CLUSTERING WITH FEATURE SELECTION FOR SPATIALLY RESOLVED TRANSCRIPTOMICS DATA

成果类型:
Article
署名作者:
Zhu, Bencong; Hu, Guanyu; Xu, Lin; Fan, Xiaodan; Li, Qiwei
署名单位:
Chinese University of Hong Kong; University of Texas System; University of Texas Health Science Center Houston; University of Texas System; University of Texas Southwestern Medical Center; University of Texas System; University of Texas Dallas
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/25-AOAS2014
发表日期:
2025
页码:
1028-1047
关键词:
false discovery rate variable selection gene-expression cell models number
摘要:
The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial context. Nevertheless, inherent challenges associated with these new high-dimensional spatial data (e.g., zero inflation) pose obstacles to effective clustering, a fundamental problem in SRT data analysis. Current computational approaches often rely on heuristic data preprocessing and arbitrary cluster number specification, leading to information loss and, consequently, suboptimal downstream analysis. In response, we present BNPSpace, a novel Bayesian nonparametric spatial clustering framework that directly models SRT count data and automatically estimates the optimal number of spatial domains. BNPSpace facilitates the partitioning of the heterogeneous spatial domain while identifying a parsimonious set of discriminating genes among spatial domains, enhancing the interpretability of the findings. Additionally, it incorporates spatial information through a Markov random field prior model, encouraging a biologically meaningful partition pattern. We assess the performance of BNPSpace utilizing both simulated and real SRT data, demonstrating each innovations above are essential for the accurate identification of spatial domains. We also verified its scalability with large-scale SRT data. In the application to the human dorsolateral prefrontal cortex (DLPFC) 10x Visium data, comprising 4788 spots, BNPSpace outperforms existing methods by identifying more coherent spatial domain patterns. Furthermore, the discriminating genes identified by BNPSpace showed significant enrichment with odd ratio = 1.877 (p-value = 0.00162) in DLPFC gene sets validated by real biological experiments, underscoring its effectiveness in revealing biologically relevant insights.
来源URL: