Structured Matrix Completion with Applications to Genomic Data Integration
成果类型:
Article
署名作者:
Cai, Tianxi; Cai, T. Tony; Zhang, Anru
署名单位:
University of Pennsylvania
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2015.1021005
发表日期:
2016
页码:
621-633
关键词:
low-rank matrix
missing value estimation
gene-expression data
ovarian-cancer
genotype imputation
Penalization
algorithm
MODEL
摘要:
Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics, and electrical engineering. Current literature on matrix completion focuses primarily on-independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured rnissingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival. Supplementary materials for this article are available online.
来源URL: