Maximization of AUC and Buffered AUC in binary classification

成果类型:
Article
署名作者:
Norton, Matthew; Uryasev, Stan
署名单位:
United States Department of Defense; United States Navy; Naval Postgraduate School; State University System of Florida; University of Florida
刊物名称:
MATHEMATICAL PROGRAMMING
ISSN/ISSBN:
0025-5610
DOI:
10.1007/s10107-018-1312-2
发表日期:
2019
页码:
575-612
关键词:
probability AREA RISK optimization curve
摘要:
In binary classification, performance metrics that are defined as the probability that some error exceeds a threshold are numerically difficult to optimize directly and also hide potentially important information about the magnitude of errors larger than the threshold. Defining similar metrics, instead, using Buffered Probability of Exceedance (bPOE) generates counterpart metrics that resolve both of these issues. We apply this approach to the case of AUC, the Area Under the ROC curve, and define Buffered AUC (bAUC). We show that bAUC can provide insights into classifier performance not revealed by AUC, while being closely related as the tightest concave lower bound and representable as the area under a modified ROC curve. Additionally, while AUC is numerically difficult to optimize directly, we show that bAUC optimization often reduces to convex or linear programming. Extending these results, we show that AUC and bAUC are special cases of Generalized bAUC and that popular Support Vector Machine (SVM) formulations for approximately maximizing AUC are equivalent to direct maximization of Generalized bAUC. We also prove bAUC generalization bounds for these SVM's. As a central component to these results, we provide an important, novel formula for calculating bPOE, the inverse of Conditional Value-at-Risk. Using this formula, we show that particular bPOE minimization problems reduce to convex and linear programming.