introduction to machine learning evaluation measures for
play

Introduction to Machine Learning Evaluation: Measures for Binary - PowerPoint PPT Presentation

Introduction to Machine Learning Evaluation: Measures for Binary Classification: ROC visualization compstat-lmu.github.io/lecture_i2ml LABELS: ROC SPACE Plot True Positive Rate and False Positive Rate: 1.00 True Class y C2 +


  1. Introduction to Machine Learning Evaluation: Measures for Binary Classification: ROC visualization compstat-lmu.github.io/lecture_i2ml

  2. LABELS: ROC SPACE Plot True Positive Rate and False Positive Rate: 1.00 True Class y C2 + − ● unclear winner Pred. + TP FP 0.75 C1 ● ˆ y FN TN − dominates ● TPR C3 0.50 TP TPR = 0.25 TP + FN FP FPR = FP + TN 0.00 0.00 0.25 0.50 0.75 1.00 FPR � c Introduction to Machine Learning – 1 / 16

  3. LABELS: ROC SPACE The best classifier lies on the top-left corner The diagonal ≈ random labels (with different proportions). Assign positive x as "pos" with 25% probability → TPR = 0 . 25. Assign negative x as "pos" with 25% probability → FPR = 0 . 25. 1.00 ● Best Pos−100% ● 0.75 ● Pos−75% TPR 0.50 0.25 ● Pos−25% 0.00 ● Pos−0% 0.00 0.25 0.50 0.75 1.00 FPR � c Introduction to Machine Learning – 2 / 16

  4. LABELS: ROC SPACE In practice, we should never obtain a classifier below the diagonal. Inverting the predicted labels (0 → 1 and 1 → 0) will result in a reflection at the diagonal. 1.00 C2 0.75 ● TPR 0.50 C1 0.25 ● 0.00 0.00 0.25 0.50 0.75 1.00 FPR � c Introduction to Machine Learning – 3 / 16

  5. LABEL DISTRIBUTION IN TPR AND FPR TPR and FPR are insensitive to the class distribution: Not affected by changes in the ratio n + / n − (at prediction). Example 1: Example 2: Proportion n + / n − = 1 Proportion n + / n − = 2 Actual Positive Actual Negative Actual Positive Actual Negative Pred. Positive 40 25 Pred. Positive 80 25 Pred. Negative 10 25 Pred. Negative 20 25 MCE = 35/100 MCE = 45/150 = 30/100 TPR = 0 . 8 TPR = 0 . 8 FPR = 0 . 5 FPR = 0 . 5 Note: If class proportions differ during training, the above is not true. Estimated posterior probabilities can change! � c Introduction to Machine Learning – 4 / 16

  6. FROM PROBABILITIES TO LABELS: ROC CURVE Remember: Both probabilistic and scoring classifiers can output classes by thresholding. h ( x ) := [ π ( x )) ≥ c ] h ( x ) = [ f ( x ) ≥ c ] or To draw a ROC curve : 1.00 Iterate through all possible True Positive Rate thresholds c 0.75 0.50 → Visual inspection of all possible thresholds / results 0.25 0.00 0.00 0.25 0.50 0.75 1.00 False Positive Rate � c Introduction to Machine Learning – 5 / 16

  7. ROC CURVE 1.00 # Truth Score 1 Pos 0.95 True Positive Rate 2 Pos 0.86 0.75 3 Pos 0.69 4 Neg 0.65 0.50 5 Pos 0.59 6 Neg 0.52 7 Pos 0.51 0.25 8 Neg 0.39 9 Neg 0.28 10 Neg 0.18 0.00 11 Pos 0.15 0.00 0.25 0.50 0.75 1.00 12 Neg 0.06 False Positive Rate c = 0.9 → TPR = 0.167 → FPR = 0 � c Introduction to Machine Learning – 6 / 16

  8. ROC CURVE 1.00 # Truth Score 1 Pos 0.95 True Positive Rate 2 Pos 0.86 0.75 3 Pos 0.69 4 Neg 0.65 0.50 5 Pos 0.59 6 Neg 0.52 7 Pos 0.51 0.25 8 Neg 0.39 9 Neg 0.28 10 Neg 0.18 0.00 11 Pos 0.15 0.00 0.25 0.50 0.75 1.00 12 Neg 0.06 False Positive Rate c = 0.85 → TPR = 0.333 → FPR = 0 � c Introduction to Machine Learning – 7 / 16

  9. ROC CURVE 1.00 # Truth Score 1 Pos 0.95 True Positive Rate 2 Pos 0.86 0.75 3 Pos 0.69 4 Neg 0.65 0.50 5 Pos 0.59 6 Neg 0.52 7 Pos 0.51 0.25 8 Neg 0.39 9 Neg 0.28 10 Neg 0.18 0.00 11 Pos 0.15 0.00 0.25 0.50 0.75 1.00 12 Neg 0.06 False Positive Rate c = 0.66 → TPR = 0.5 → FPR = 0 � c Introduction to Machine Learning – 8 / 16

  10. ROC CURVE 1.00 # Truth Score 1 Pos 0.95 True Positive Rate 2 Pos 0.86 0.75 3 Pos 0.69 4 Neg 0.65 0.50 5 Pos 0.59 6 Neg 0.52 7 Pos 0.51 0.25 8 Neg 0.39 9 Neg 0.28 10 Neg 0.18 0.00 11 Pos 0.15 0.00 0.25 0.50 0.75 1.00 12 Neg 0.06 False Positive Rate c = 0.6 → TPR = 0.5 → FPR = 0.167 � c Introduction to Machine Learning – 9 / 16

  11. ROC CURVE 1.00 # Truth Score 1 Pos 0.95 True Positive Rate 2 Pos 0.86 0.75 3 Pos 0.69 4 Neg 0.65 0.50 5 Pos 0.59 6 Neg 0.52 7 Pos 0.51 0.25 8 Neg 0.39 9 Neg 0.28 10 Neg 0.18 0.00 11 Pos 0.15 0.00 0.25 0.50 0.75 1.00 12 Neg 0.06 False Positive Rate c = 0.55 → TPR = 0.667 → FPR = 0.167 � c Introduction to Machine Learning – 10 / 16

  12. ROC CURVE 1.00 # Truth Score 1 Pos 0.95 True Positive Rate 2 Pos 0.86 0.75 3 Pos 0.69 4 Neg 0.65 0.50 5 Pos 0.59 6 Neg 0.52 7 Pos 0.51 0.25 8 Neg 0.39 9 Neg 0.28 10 Neg 0.18 0.00 11 Pos 0.15 0.00 0.25 0.50 0.75 1.00 12 Neg 0.06 False Positive Rate c = 0.3 → TPR = 0.833 → FPR = 0.5 � c Introduction to Machine Learning – 11 / 16

  13. ROC CURVE 1.00 # Truth Score 1 Pos 0.95 True Positive Rate 2 Pos 0.86 0.75 3 Pos 0.69 4 Neg 0.65 0.50 5 Pos 0.59 6 Neg 0.52 7 Pos 0.51 0.25 8 Neg 0.39 9 Neg 0.28 10 Neg 0.18 0.00 11 Pos 0.15 0.00 0.25 0.50 0.75 1.00 12 Neg 0.06 False Positive Rate � c Introduction to Machine Learning – 12 / 16

  14. ROC CURVE The closer the curve to the top-left corner, the better If ROC curves cross, a different model can be better in different parts of the ROC space 1.00 model 0.75 very good TPR 0.50 ok1 ok2 0.25 bad 0.00 0.00 0.25 0.50 0.75 1.00 FPR � c Introduction to Machine Learning – 13 / 16

  15. AUC: AREA UNDER ROC CURVE The AUC (in [0,1]) is a single metric to evaluate scoring classifiers AUC = 1: Perfect classifier AUC = 0.5: Randomly ordered 1.00 True Positive Rate 0.75 0.50 0.25 0.00 0.00 0.25 0.50 0.75 1.00 False Positive Rate � c Introduction to Machine Learning – 14 / 16

  16. AUC: AREA UNDER ROC CURVE Interpretation: Probability that classifier ranks a random positive higher than a random negative observation Truth Score Choose a random positive 1 0.9 1 0.76 1 0.76 1 0.7 0 0.5 Choose a random negative 1 0.45 0 0.3 0 0.3 0 0.1 AUC = 0.9167 Classifier ranks the positive higher than the negative (with probability 0.9167) � c Introduction to Machine Learning – 15 / 16

  17. PARTIAL AUC Sometimes it can be useful to look at a specific region under the ROC curve ⇒ partial AUC (pAUC). Examples: focus on a region with low FPR or a region with high TPR: 1.0 1.0 0.8 0.8 0.6 0.6 tpr tpr Partial AUC: 0.086 Partial AUC: 0.128 0.4 0.4 0.2 0.2 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 fpr fpr � c Introduction to Machine Learning – 16 / 16

Recommend


More recommend