Drive More Effective Data-Based Innovations: Enhancing the Utility of Secure Databases
成果类型:
Article
署名作者:
Qian, Yi; Xie, Hui
署名单位:
University of British Columbia; University of Illinois System; University of Illinois Chicago; University of Illinois Chicago Hospital
刊物名称:
MANAGEMENT SCIENCE
ISSN/ISSBN:
0025-1909
DOI:
10.1287/mnsc.2014.2026
发表日期:
2015
页码:
520-541
关键词:
database
Digital economy
INNOVATION
Nonparametric
perturbation
privacy
shuffling
摘要:
Databases play a central role in evidence-based innovations in business, economics, social, and health sciences. In modern business and society, there are rapidly growing demands for constructing analytically valid databases that also are secure and protect sensitive information to meet customer and public expectations, to minimize financial losses, and to comply with privacy regulations and laws. We propose new data perturbation and shuffling (DPS) procedures, named MORE, for this purpose. As compared with existing DPS methods, MORE can substantially increase the utility of secure databases without increasing disclosure risk. MORE is capable of preserving important nonmonotonic relationships among attributes, such as the inverted-U relationship between competition and innovation. Maintaining such relationships is often the key to determining optimal levels of policy and managerial interventions. MORE does not require data to be of particular types or have particular distributional shapes. Instead, it provides unified, flexible, and robust algorithms to mask general types of confidential variables with arbitrary distributions, thereby making it suitable for general-purpose data masking. Since MORE nests the commonly used generalized linear models as special cases, a much wider range of statistical analyses can be conducted by using the secure databases with results similar to those achieved by using the original databases. Unlike existing DPS approaches that typically require a joint model for all variables, MORE requires no modeling of nonconfidential variables and thus further increases the robustness of secure databases. Evaluation of MORE through Monte Carlo simulation studies and empirical applications demonstrates that it performs better than existing data-masking methods.
来源URL: