ROC Analysis for Evaluation of Machine Learning Algorithms Larry Holder School of Electrical Engineering and Computer Science Washington State University
References � Provost et al., “The Case Against Accuracy Estimation for Comparing Induction Algorithms,” International Conference on Machine Learning , 1998. � Rob Holte’s talk on ROC analysis at www.cs.ualberta.ca/~ holte/Learning/ROCtalk/
Motivation � Most comparisons of machine learning algorithms use classification accuracy � Problems with this approach � May be different costs associated with false positive and false negative errors � Training data may not reflect true class distribution
Motivation � Perhaps maximizing accuracy is still okay � Alter class distribution to reduce FP/FN costs � Problems � Only works on 2-class case � Assigning true costs is difficult � Unsure of true class distribution � So, must show classifier L1 better than L2 under more general conditions
ROC Analysis � Receiver Operating Characteristic (ROC) � Originated from signal detection theory � Common in medical diagnosis � Becoming common in ML evaluations � ROC curves assess predictive behavior independent of error costs or class distributions
Confusion Matrix Classified As True Class Positive Negative Positive # TP # FN Negative # FP # TN � True Positive rate TP = # TP/# P � False Positive rate FP = # FP/# N � Rates independent of class distribution
ROC Curves � ROC space � False positive (FP) rate on X axis � True positive (TP) rate on Y axis � Each classifier represented by a point in ROC space corresponding to its (FP,TP) pair � For continuous-output models, classifiers defined based on varying thresholds on output
Example ROC Curve 1.0 True positive rate 0.75 0.5 Learner L1 Learner L2 0.25 Learner L3 Random 0 0 0.25 0.5 0.75 1.0 False positive rate
Domination in ROC Space � Learner L1 dominates L2 if L2’s ROC curve is beneath L1’s curve � If L1 dominates L2, then L1 better than L2 for all possible costs and class distributions � If neither dominates (L2 and L3), then there are times when L2 maximizes accuracy, but does not minimize cost
Expected ROC Curve � Perform k-fold cross-validation on each learner � ROC curve from each fold i treated as a function R i such that TP = R i (FP) ^ � R(FP) = mean (R i (FP)) ^ � Generate ROC curve by evenly sampling R along FP axis � Compute confidence intervals according to binomial distribution over resulting TP values
Accuracy vs. ROC Curves � Hypothesis � Standard learning algorithms produce dominating ROC models � Answer: No � Results on 10 datasets from UCI repository show only one instance of a dominating model � Thus, learners maximizing accuracy typically do not dominate in ROC space � Thus, worse than others for some costs and class distributions � Non-dominating ROC curves can still provide regions of superiority for different learners
Summary � Results comparing accuracy of learning algorithms are questionable � Especially in scenarios with non-uniform costs and class distributions � ROC curves provide a better look at where different learners minimize cost � Recommends proper ROC analysis for comparison of learning algorithms
Recommend
More recommend