A NEW CENTRAL LIMIT THEOREM FOR THE AUGMENTED IPW ESTIMATOR: VARIANCE INFLATION, CROSS-FIT COVARIANCE AND BEYOND
成果类型:
Article
署名作者:
Jiang, Kuanhao; Mukherjee, Rajarshi; Sen, Subhabrata; Sur, Pragya
署名单位:
Harvard University; Harvard University; Harvard T.H. Chan School of Public Health
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/24-AOS2476
发表日期:
2025
页码:
647-675
关键词:
regularized calibrated estimation
message-passing algorithms
propensity score
Causal Inference
regression-models
robust regression
g-computation
BIAS
UNIVERSALITY
asymptotics
摘要:
Estimation of the average treatment effect (ATE) is a central problem in causal inference. In recent times, inference for the ATE in the presence of high-dimensional covariates has been extensively studied. Among diverse approaches that have been proposed, augmented inverse propensity weighting (AIPW) with cross-fitting has emerged a popular choice in practice. In this work, we study this cross-fit AIPW estimator under well-specified outcome regression and propensity score models in a high-dimensional regime where the number of features and samples are both large and comparable. Under assumptions on the covariate distribution, we establish a new central limit theorem for the suitably scaled cross-fit AIPW that applies without any sparsity assumptions on the underlying high-dimensional parameters. Our CLT uncovers two crucial phenomena among others: (i) the AIPW exhibits a substantial variance inflation that can be precisely quantified in terms of the signal-to-noise ratio and other problem parameters, (ii) the asymptotic covariance between the precross-fit estimators is nonnegligible even on the root n scale. These findings are strikingly different from their classical counterparts. On the technical front, our work utilizes a novel interplay between three distinct tools-approximate message passing theory, the theory of deterministic equivalents and the leave-one-out approach. We believe our proof techniques should be useful for analyzing other two-stage estimators in this high-dimensional regime. We complement our theoretical results with simulations that demonstrate both the finite sample efficacy of our CLT and its robustness to our assumptions. Finally, we provide some theoretical evidence for the universality of our CLT to the law of the covariates, and explore the effects of certain forms of model misspecification.