您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > The Annals of Applied Statistics > 2013 > 1期

SPARSE INTEGRATIVE CLUSTERING OF MULTIPLE OMICS DATA SETS

成果类型：

Article

署名作者：

Shen, Ronglai; Wang, Sijian; Mo, Qianxing

署名单位：

Memorial Sloan Kettering Cancer Center; University of Wisconsin System; University of Wisconsin Madison; University of Wisconsin System; University of Wisconsin Madison; Baylor College of Medicine

刊物名称：

ANNALS OF APPLIED STATISTICS

ISSN/ISSBN：

1932-6157

DOI：

10.1214/12-AOAS578

发表日期：

2013

页码：

269-294

关键词：

circular binary segmentation gene-expression patterns copy-number alteration variable selection breast-cancer validation likelihood regression genomics reveals

摘要：

High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling approach measures multiple omics data types simultaneously in the same set of biological samples. Such approach renders an integrated data resolution that would not be available with any single data type. In this study, we use penalized latent variable regression methods for joint modeling of multiple omics data types to identify common latent variables that can be used to cluster patient samples into biologically and clinically relevant disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996) 267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005) 91-108] methods to induce sparsity in the coefficient vectors, revealing important genomic features that have significant contributions to the latent variables. An iterative ridge regression is used to compute the sparse coefficient vectors. In model selection, a uniform design [Monographs on Statistics and Applied Probability (1994) Chapman & Hall] is used to seek experimental points that scattered uniformly across the search domain for efficient sampling of tuning parameter combinations. We compared our method to sparse singular value decomposition (SVD) and penalized Gaussian mixture model (GMM) using both real and simulated data sets. The proposed method is applied to integrate genomic, epigenomic and transcriptomic data for subtype analysis in breast and lung cancer data sets.

来源URL：

访问原文