Privacy protection in data mining: A perturbation approach for categorical data
成果类型:
Article
署名作者:
Li, Xiao-Bai; Sarkar, Sumit
署名单位:
University of Massachusetts System; University of Massachusetts Lowell; University of Texas System; University of Texas Dallas
刊物名称:
INFORMATION SYSTEMS RESEARCH
ISSN/ISSBN:
1047-7047
DOI:
10.1287/isre.1060.0095
发表日期:
2006
页码:
254-270
关键词:
statistical databases
security
摘要:
To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase 11 (to preserve the joint distribution).
来源URL: