Tuning Data Mining Methods for Cost-Sensitive Regression: A Study in Loan Charge-Off Forecasting
成果类型:
Article
署名作者:
Bansal, Gaurav; Sinha, Atish P.; Zhao, Huimin
署名单位:
University of Wisconsin System; University of Wisconsin System; University of Wisconsin Milwaukee
刊物名称:
JOURNAL OF MANAGEMENT INFORMATION SYSTEMS
ISSN/ISSBN:
0742-1222
DOI:
10.2753/MIS0742-1222250309
发表日期:
2008
页码:
315-336
关键词:
models
banks
摘要:
Real-world predictive data mining (classification or regression) problems are often cost sensitive, meaning that different types of prediction errors are not equally costly. While cost-sensitive learning methods for classification problems have been extensively studied recently, cost-sensitive regression has not been adequately addressed in the data mining literature yet. In this paper, we first advocate the use of average misprediction cost as a measure for assessing the performance of a cost-sensitive regression model. We then propose an efficient algorithm for tuning a regression model to further reduce its average misprediction cost. In contrast with previous statistical methods, which are tailored to particular cost functions. this algorithm can deal with any convex cost functions without modifying the underlying regression methods. We have evaluated the algorithm in bank loan charge-off forecasting, where underforecasting is considered much more costly than overforecasting. Our results show that the proposed algorithm significantly reduces the average misprediction costs of models learned with various base regression methods,such as linear regression, model tree, and neural network. The amount of cost reduction increases as the difference between the unit costs of the two types of errors (overprediction and underprediction) increases.