Semi-Supervised Linear Regression
成果类型:
Article
署名作者:
Azriel, David; Brown, Lawrence D.; Sklar, Michael; Berk, Richard; Buja, Andreas; Zhao, Linda
署名单位:
Technion Israel Institute of Technology; University of Pennsylvania; Stanford University
刊物名称:
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
ISSN/ISSBN:
0162-1459
DOI:
10.1080/01621459.2021.1915320
发表日期:
2022
页码:
2238-2251
关键词:
inference
efficient
摘要:
We study a regression problem where for some part of the data we observe both the label variable (Y) and the predictors (X), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation E[Y vertical bar X] is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least-square estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties. The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of E[Y vertical bar X]; otherwise, they are asymptotically equivalent. The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology.
来源URL: