APPROXIMATION ERROR FROM DISCRETIZATIONS AND ITS APPLICATIONS
成果类型:
Article
署名作者:
Zhao, Junlong; Liu, Xiumin; Du, Bin; Liu, Yufeng
署名单位:
Beijing Normal University; Beijing Technology & Business University; University of North Carolina; University of North Carolina Chapel Hill; University of North Carolina School of Medicine
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/24-AOS2470
发表日期:
2025
页码:
589-614
关键词:
sliced inverse regression
bandwidth selection
random forests
dimension
CHOICE
摘要:
Converting a continuous variable into a discrete one is a commonly used technique for solving various problems in both statistics and machine learning. It is well known that discretizations result in biases. However, this issue has not been studied systematically. In this paper, a general framework is proposed to understand and compare the approximation errors of different slicing strategies. Poincar & eacute;-type inequalities are first established for univariate discretizations and then generalized to the multivariate and other settings. It is shown that the bias is controlled by two factors: the distance between two specific distributions that are generated with and without discretizations respectively, and the smoothness of the functions involved. Several important applications are considered to illustrate the usefulness of the results. Our results help to understand the approximation error of some matrix used in the literature of dimension reduction. Furthermore, as an illustration of the usefulness of discretizations, we propose an algorithm for regression problems, by combining random forest with partial discretizations of responses. Simulation results confirm the advantages of this algorithm over the classical random forest.