Screening p-hackers: Dissemination noise as bait

成果类型:
Article
署名作者:
Echenique, Federico; He, Kevin
署名单位:
University of California System; University of California Berkeley; University of Pennsylvania
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-12595
DOI:
10.1073/pnas.2400787121
发表日期:
2024-05-21
关键词:
Persuasion
摘要:
We show that adding noise before publishing data effectively screens p -hacked findings: spurious explanations produced by fitting many statistical models (data mining). Noise creates baits that affect two types of researchers differently. Uninformed p -hackers, who are fully ignorant of the true mechanism and engage in data mining, often fall for baits. Informed researchers, who start with an ex ante hypothesis, are minimally affected. We show that as the number of observations grows large, dissemination noise asymptotically achieves optimal screening. In a tractable special case where the informed researchers' theory can identify the true causal mechanism with very few data, we characterize the optimal level of dissemination noise and highlight the relevant trade-offs. Dissemination noise is a tool that statistical agencies currently use to protect privacy. We argue this existing practice can be repurposed to screen p -hackers and thus improve research credibility. Significance Motivated by recent problems with research integrity in the behavioral sciences, we develop a model of researcher incentives and propose dissemination noise as a way to screen p -hacked findings that arise from data mining. In our model, p -hackers use observational data to uncover spurious explanatory mechanisms, while honest researchers use the same data to test ex ante hypotheses. We find that intentionally adding noise to data before making data public helps distinguish spurious correlations from genuine causal mechanisms. We characterize the optimal noise level in a tractable special case. This approach repurposes a privacy -protection technique currently used by data producers (e.g., the US Census Bureau) to help improve research credibility.