BAYESIAN DATA SYNTHESIS AND THE UTILITY-RISK TRADE-OFF FOR MIXED EPIDEMIOLOGICAL DATA

成果类型:
Article
署名作者:
Feldman, Joseph; Kowal, Daniel R.
署名单位:
Rice University
刊物名称:
ANNALS OF APPLIED STATISTICS
ISSN/ISSBN:
1932-6157
DOI:
10.1214/22-AOAS1604
发表日期:
2022
页码:
2577-2602
关键词:
models Microdata
摘要:
Much of the microdata used for epidemiological studies contain sensi-tive measurements on real individuals. As a result, such microdata cannot be published out of privacy concerns, and without public access to these data, any statistical analyses originally published on them are nearly impos-sible to reproduce. To promote the dissemination of key datasets for analy-sis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic high-dimensional microdatasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed syn-thetic datasets through posterior predictive sampling. Furthermore, a focal point of epidemiological data analysis is the study of conditional relation-ships between various exposures and key outcome variables through regres-sion analysis. We design a modified data synthesis strategy to target and pre-serve these conditional relationships, including both nonlinearities and inter-actions. The proposed techniques are deployed to create a synthetic version of a confidential dataset containing dozens of health, cognitive, and social measurements on nearly 20,000 North Carolina children.
来源URL: