The shortcomings of synthetic census microdata

成果类型:
Article
署名作者:
Ruggles, Steven
署名单位:
University of Minnesota System; University of Minnesota Twin Cities; University of Minnesota System; University of Minnesota Twin Cities
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-14086
DOI:
10.1073/pnas.2424655122
发表日期:
2025-03-18
关键词:
Differential Privacy
摘要:
U.S. Census Bureau officials recently reaffirmed the Bureau's ongoing efforts to replace the American Community Survey (ACS) public use microdata sample with fully synthetic data to protect respondent confidentiality. With the growth of computing power and expansion of private sector data about the population, the Census Bureau has valid concerns about confidentiality threats to public data. The current plan for fully synthetic data, however, threatens a cornerstone of the nation's scientific infrastructure. Census microdata samples are among the most frequently used sources in social science research, and they are an essential tool for policy formation and planning from the local to the national level. Synthetic census microdata are not suitable for most research and policy applications. There have been no recent attempts to quantify disclosure risk in the ACS microdata, and the sole existing study failed to establish a credible threat for positive identification of ACS respondents by external intruders. I argue that we need new empirical research to pinpoint specific vulnerabilities that could allow an intruder to determine a particular individual's confidential census responses. If significant vulnerabilities are uncovered, the Census Bureau in partnership with the research community should develop targeted methods for disclosure risk reduction that minimize damage to data usability.