Reidentification Risk in Panel Data: Protecting for k-Anonymity
成果类型:
Article
署名作者:
Li, Shaobo; Schneider, Matthew J.; Yu, Yan; Gupta, Sachin
署名单位:
University of Kansas; Drexel University; University System of Ohio; University of Cincinnati; Cornell University
刊物名称:
INFORMATION SYSTEMS RESEARCH
ISSN/ISSBN:
1047-7047
DOI:
10.1287/isre.2022.1169
发表日期:
2023
页码:
1066-1088
关键词:
privacy protection
decision-model
disclosure
LINKAGE
摘要:
We consider the risk of reidentification of panelists in marketing research data that are widely used to obtain insights into buyer behavior and to develop marketing strategy. We find that 17%-94% of the panelists in 15 frequently bought consumer goods categories are subject to high risk of reidentification through a potential record linkage attack based on their unique purchasing histories even when their identities are anonymized. We first demonstrate that the risk of reidentification is vastly understated by unicity, the conventional measure. Instead, we propose a new measure of reidentification risk, termed sno-unicity, which accounts for the longitudinal nature of panel data, and show that it is much larger than unicity. To protect the privacy of panelists, we consider the well-known privacy notion of k-anonymity and develop a new approach called graph-based minimum movement k-anonymization (k -MM) that is designed especially for panel data. The proposed k-MM approach can be formulated as an optimization problem in which the objective is to minimally distort variables in the original data based on weights that users prespecify corresponding to their use case. We further show how our approach can be extended to achieve l-diversity. We apply the k-MM approach to two different panel data sets that are widely used in marketing research. To achieve a given privacy level, compared with several benchmark protection methods, the protected data from our method result in the least distortion in inferences about key marketing metrics, such as brand market shares, share of category requirements, brand switching rates, and marketing-mix parameters estimated from a hierarchical Bayesian brand choice model.