NONPARAMETRIC CLASSIFICATION WITH MISSING DATA

成果类型:
Article
署名作者:
Sell, Torben; Berrett, Thomas b.; Cannings, Timothy i.
署名单位:
University of Edinburgh; Heriot Watt University; University of Edinburgh; University of Warwick
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/24-AOS2389
发表日期:
2024
页码:
1178-1200
关键词:
minimax rate DISCRIMINATION
摘要:
We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate depends on parameters that control the tail behaviour of the marginal feature distributions, the smoothness of the regression function and a margin condition. The ambient data dimension does not appear in the minimax rate, which can therefore be faster than in the classical nonparametric setting. We further propose a sifier, based on a careful combination of a k-nearest neighbour algorithm and a thresholding step. The HAM classifier attains the minimax rate up to polylogarithmic factors and numerical experiments further illustrate its utility.