Data integration with high dimensionality
成果类型:
Article
署名作者:
Gao, Xin; Carroll, Raymond J.
署名单位:
York University - Canada; Texas A&M University System; Texas A&M University College Station
刊物名称:
BIOMETRIKA
ISSN/ISSBN:
0006-3444
DOI:
10.1093/biomet/asx023
发表日期:
2017
页码:
251272
关键词:
nonconcave penalized likelihood
bayesian information criteria
variable selection
model selection
group lasso
regression-models
inference
Consistency
摘要:
We consider situations where the data consist of a number of responses for each individual, which may include a mix of discrete and continuous variables. The data also include a class of predictors, where the same predictor may have different physical measurements across different experiments depending on how the predictor is measured. The goal is to select which predictors affect any of the responses, where the number of such informative predictors tends to infinity as the sample size increases. There are marginal likelihoods for each experiment; we specify a pseudolikelihood combining the marginal likelihoods, and propose a pseudolikelihood information criterion. Under regularity conditions, we establish selection consistency for this criterion with unbounded true model size. The proposed method includes a Bayesian information criterion with appropriate penalty term as a special case. Simulations indicate that data integration can dramatically improve upon using only one data source.
来源URL: