您的位置: 首页 > 全球经管学术 > 顶刊追踪 > 顶尖期刊 > 统计学 > Biometrika > 2022 > 2期

High-dimensional semi-supervised learning: in search of optimal inference of the mean

成果类型：

Article

署名作者：

Zhang, Yuqian; Bradic, Jelena

署名单位：

Renmin University of China; University of California System; University of California San Diego

刊物名称：

BIOMETRIKA

ISSN/ISSBN：

0006-3444

DOI：

10.1093/biomet/asab042

发表日期：

2022

页码：

387403

关键词：

regularized calibrated estimation variable selection Robust Estimation Missing Data Lasso efficient tests

摘要：

Afundamental challenge in semi-supervised learning lies in the observed data's disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-n inference of the outcome's mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root n. This is achieved by a novel k-fold, cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric or semiparametric models. We apply our methods to estimating heterogeneous treatment effects.