Cross-prediction-powered inference
成果类型:
Article
署名作者:
Zrnic, Tijana; Candes, Emmanuel J.
署名单位:
Stanford University; Stanford University; Stanford University
刊物名称:
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
ISSN/ISSBN:
0027-12377
DOI:
10.1073/pnas.2322083121
发表日期:
2024-04-09
关键词:
multiple imputation
regression
models
CONSEQUENCES
outcomes
摘要:
While reliable data-driven decision-making hinges on high-quality labeled data, the and expensive scientific measurements. Machine learning is becoming an appealing from satellite imagery are used to supplement accurate survey data, and so on. Since the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of Jordan, T. Zrnic, Science 382, 669-674 (2023)], which assumes that a good pretrained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its CIs typically have significantly lower variability.