Genetically engineered decision trees: Population diversity produces smarter trees
成果类型:
Article
署名作者:
Fu, ZW; Golden, B; Lele, S; Raghavan, S; Wasil, E
署名单位:
Federal National Mortgage Association (Fannie Mae); University System of Maryland; University of Maryland College Park; American University
刊物名称:
OPERATIONS RESEARCH
ISSN/ISSBN:
0030-364X
DOI:
10.1287/opre.51.6.894.24919
发表日期:
2003
页码:
894-907
关键词:
STATISTICS
data analysis : data mining marketing
estimation/statistical techniques : decision trees computers/computer science
artificial
intelligence : genetic algorithms
摘要:
When considering a decision tree for the purpose of classification, accuracy is usually the sole performance measure used in the construction process. In this paper, we introduce the idea of combining a decision tree's expected value and variance in a new probabilistic measure for assessing the performance of a tree. We develop a genetic algorithm for constructing a tree using our new measure and conduct computational experiments that show the advantages of our approach. Further, we investigate the effect of introducing diversity into the population used by our genetic algorithm. We allow the genetic algorithm to simultaneously focus on two distinct probabilistic measures-one that is risk averse and one that is risk seeking. Our bivariate genetic algorithm for constructing a decision tree performs very well, scales up quite nicely to handle data sets with hundreds of thousands of points, and requires only a small percent of the data to generate a high-quality decision tree. We demonstrate the effectiveness of our algorithm on three large data sets.