BINNED MULTINOMIAL LOGISTIC REGRESSION FOR INTEGRATIVE CELL-TYPE ANNOTATION
成果类型:
Article
署名作者:
Motwani, Keshav; Bacher, Rhonda; Molstad, Aaron j.
署名单位:
University of Washington; University of Washington Seattle; State University System of Florida; University of Florida; State University System of Florida; University of Florida
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/23-AOAS1769
发表日期:
2023
页码:
3426-3449
关键词:
摘要:
Categorizing individual cells into one of many known cell-type cate-gories, also known as cell-type annotation, is a critical step in the analy-sis of single-cell genomics data. The current process of annotation is time intensive and subjective, which has led to different studies describing cell types with labels of varying degrees of resolution. While supervised learn-ing approaches have provided automated solutions to annotation, there re-mains a significant challenge in fitting a unified model for multiple datasets with inconsistent labels. In this article we propose a new multinomial lo-gistic regression estimator which can be used to model cell-type probabil-ities by integrating multiple datasets with labels of varying resolution. To compute our estimator, we solve a nonconvex optimization problem using a blockwise proximal gradient descent algorithm. We show through simu-lation studies that our approach estimates cell-type probabilities more accu-rately than competitors in a wide variety of scenarios. We apply our method to 10 single-cell RNA-seq datasets and demonstrate its utility in predicting fine resolution cell-type labels on unlabeled data as well as refining cell -type labels on data with existing coarse resolution annotations. Finally, we demonstrate that our method can lead to novel scientific insights in the con-text of a differential expression analysis comparing peripheral blood gene expression before and after treatment with interferon-a. An R package im-plementing the method is available in the Supplementary Material and at https://github.com/keshav-motwani/IBMR, and the collection of datasets we analyze is available at https://github.com/keshav-motwani/AnnotatedPBMC.
来源URL: