MINIMUM IMPURITY PARTITIONS

成果类型:
Note
署名作者:
BURSHTEIN, D; DELLAPIETRA, V; KANEVSKY, D; NADAS, A
署名单位:
International Business Machines (IBM); IBM USA
刊物名称:
ANNALS OF STATISTICS
ISSN/ISSBN:
0090-5364
DOI:
10.1214/aos/1176348789
发表日期:
1992
页码:
1637-1646
关键词:
摘要:
Let (X, U) be jointly distributed on K X R(n). Let Y = E(U\X) and let U be the convex hull of the range of U. Let C: K --> C = {1, 2, ..., k}, k greater-than-or-equal-to 1, induce a measurable k way partition {K1, ...,K(k)} of K. Define the impurity of K(c) = C-1(c) to be phi(c, E(U\C(X) = c)), where phi: C x U --> R1 is a concave function in its second argument. Define the impurity psi(C) of the partition as the average impurity of its members: psi(C) = Ephi(C(X), E(U\C(X))). We show that for any C: K --> C there exists a mapping C: U --> C, such that psi(C(Y)) less-than-or-equal-to psi(C) and such that C-1(c) is convex, that is, for each i, j is-an-element-of C, i not-equal j, there exists a separating hyperplane between C-1(i) and C-1(j). This generalizes some results in statistics and information theory. Suitable choices of U and phi lead to optimal partitions of simple form useful in the construction of classification trees and multidimensional regression trees.