optimizing abstaining classifiers u s i n g r o c a n a l
play

Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s - PowerPoint PPT Presentation

IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't dek p e'tr ek / Tadek Pietraszek pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:


  1. IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers u s i n g R O C A n a l y s i s / 't ʌ ·dek p ɪ e·'tr ʌ · ʃ ek / Tadek Pietraszek pie@zurich.ibm.com ICML 2005 August 9, 2005

  2. “To classify, or not to classify: that is the question.” 2 August 9, 2005 ICML2005

  3. Motivation ! Abstaining classifiers are classifiers that in certain cases can refrain from classification and are similar to human experts who can say “I don’t know”. ! In many domains such experts are preferred to the ones that always make a decision and are sometimes wrong (think “doctor”). ! Machine learning has frequently used abstaining classifiers ([FH04], [GL00], [PMAS94], [Tort00]) also implicitly (e.g., active learning, delegating classifiers, triskels (ICML05)). ! Q1: How do we optimally select abstaining classifiers? ! Q2: How do we compare normal and abstaining classifiers? 3 August 9, 2005 ICML2005

  4. Outline 1. ROC Background 2. Tri-State Classifier 1. Cost-Based Model 2. Bounded-Abstention Model 3. Bounded-Improvement Model 3. Experiments, Results 4. Summary 4 August 9, 2005 ICML2005

  5. 1. ROC Background 2. Abstaining Classifier Cost-Based Notation Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Binary classifier C is a function : i α {+,-} , where i ∈ I is an instance ! Ranker R (a.k.a scoring classifier) is a function attaching rank to an instance i α R , can be converted to a binary classifier C τ using ∀ i : C τ (i) = + ⇔ R (i) ≥ τ ! Abstaining binary classifier A is a classifier that in certain case can refrain from classification. We denote it as attaching a third class “ ? ”. 5 August 9, 2005 ICML2005

  6. 1. ROC Background 2. Abstaining Classifier Cost-Based ROC Background Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Evaluate model performance under all class and cost distributions – 2D plot (X – false positive rate, Y – true positive rate) – Classifier C corresponds to a single point on the ROC curve (fp, tp) . ! Classifier C τ (or a machine learning method L τ ) has a parameter τ , varying which produces multiple points. ! Therefore we consider a ROC curve a function f : τ α (fp τ , tp τ ) . ! Can find an inverse function f -1 : (fp τ , tp τ ) α τ 6 August 9, 2005 ICML2005

  7. 1. ROC Background 2. Abstaining Classifier Cost-Based ROC Background Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! ROC Convex Hull – A piecewise-linear convex down curve f R , having the following properties: f R (0) = 0, f R (1) = 1 • Slope of f R is monotonically non-increasing . • – Assume that for any value m , there [PF98] exists f R (x) = m . • Vertices have ``slopes’’ assuming values between the slopes of adjacent edges Assume sentinel edges: 0 th edge with a slope ∞ and (n+1) th edge with a • slope 0. – We will use ROCCH instead of ROC. 7 August 9, 2005 ICML2005

  8. 1. ROC Background 2. Abstaining Classifier Cost-Based Some Definitions Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Confusion Matrix TP FP = = A/C + - tp fp + + TP FN FP TN + TP FN P FN = fn - FP TN N + TP FN ! Cost Matrix A/C + - + 0 c 12 c CR = 21 - c 21 0 c 12 A = Actual, C = Classified as 8 August 9, 2005 ICML2005

  9. 1. ROC Background 2. Abstaining Classifier Cost-Based Cost Minimizing Criteria for One Classifier Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Known iso-performance lines [PF98] ( ) N ′ = f ROC fp CR P 9 August 9, 2005 ICML2005

  10. Outline 1. ROC Background 2. Tri-State Classifier 1. Cost-Based Model 2. Bounded-Abstention Model 3. Bounded-Improvement Model 3. Experiments, Results 4. Summary 10 August 9, 2005 ICML2005

  11. 1. ROC Background 2. Abstaining Classifier Cost-Based Metaclassifier A α , β Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! IDEA: Construct the classifier as follows: + = + C α C β C ( x ) Result α ) ( ) ( = = − ∧ = + A ( x ) ? C ( x ) C ( x ) + + + α β α β , − = − - + ? C ( x ) β + - Impossible where C α , C β is such that: - - - ∀ = + ⇒ = + x : ( C ( x ) C ( x ) ) α β ∧ = − ⇒ = − C x C x ( ( ) ( ) ) β α ! Can we optimally select C α , C β ? 11 August 9, 2005 ICML2005

  12. 1. ROC Background 2. Abstaining Classifier Cost-Based Requirements on the ROC Curve Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary Requirement: for a ROC curve and any two classifiers C α and C β corresponding to points (fp α , tp α ) and (fp β , tp β ) such that fp α ≤ fp β ∀ = + ⇒ = + x : ( C ( x ) C ( x ) ) α β ∧ = − ⇒ = − C x C x ( ( ) ( ) ) β α ! Conditions are the same used by [FlachWu03] and are met in particular if classifiers C α and C β are constructed from a single ranker R . 12 August 9, 2005 ICML2005

  13. 1. ROC Background 2. Abstaining Classifier Cost-Based “Optimal” Metaclassifier A α , β Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! How do we compare binary classifiers and abstaining classifiers? How to select an optimal classifier? ! No clear answer – Use cost based model (Cost-Based Model) – Use boundary conditions: • Maximum number of instances classified as “?” (Bounded- Abstention Model) • Maximum misclassification cost (Bounded-Improvement Model) 13 August 9, 2005 ICML2005

  14. 1. ROC Background 2. Abstaining Classifier Cost-Based Cost-Based Model Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Cost Matrix C α A/C + - A/C + - ? + TP α FN α + 0 c 12 c 13 - FP α TN α - c 21 0 c 23 C β A/C + - ! Important properties + TP β FN β ( )( ) ⇒ ≥ fp fp fp fp - FP β TN β α β β α ( )( ) ⇒ ≥ fn fn fn fn β α β α A = Actual, C = Classified as 14 August 9, 2005 ICML2005

  15. 1. ROC Background 2. Abstaining Classifier Cost-Based Selecting the Optimal Classifier Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Similar criteria – minimize the cost    ( ) ( )  1 = + + − + − rc FN c FP c FP FP c FN FN c   1 2 3 1 2 β 3 α 1 4 β 4 2 4 α 4 3 1 4 α 4 2 4 β 4 3 + 12 21 23 13   N P   α β β α − β − α fp , fp fn , fn disagree misclass . disagree misclass . ∂ ∂ rc rc = ∧ = ⇒ 0 0 ∂ ∂ FP FP β α c N ′ = 23 f ( fp ) β − ROC c c P 12 13 − c c N ′ = 21 23 f ( fp ) α ROC c P 13 15 August 9, 2005 ICML2005

  16. Cost-Based Model – a Simulated Example Misclassification cost for different Misclassification cost for different ROC curve with two optimal classifiers combinations of A and B combinations of A and B 1.0 Classifier B 0.5 0.5 0.8 0.4 0.4 Cost Cost 0.6 0.3 0.3 TP Classifier A 0.4 c N ′ = 23 f ( fp ) 0.2 0.2 β − ROC c c P 0.0 0.0 12 13 1.0 1.0 0.2 0.2 0.2 − 0.8 0.8 c c N 0.4 0.4 ′ = 21 23 0.6 0.6 f ( fp ) FP(b) FP(b) α 0.6 0.6 ROC F F 0.4 0.4 c P P P ( ( a a ) ) 0.8 0.8 0.0 13 0.2 0.2 1.0 1.0 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 FP 16 August 9, 2005 ICML2005

  17. 1. ROC Background 2. Abstaining Classifier Cost-Based Understanding Cost Matrices Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! 2x2 cost matrix is well known. 2x3 cost matrices has some interesting properties: e.g., under which conditions the optimal classifier is an abstaining classifier. ! Our derivation is valid for ( ) ( ) ( ) ≥ ∧ > ∧ ≥ + c c c c c c c c c c 21 23 12 13 21 12 21 13 23 12 we can prove that if this condition is not met the classifier is a trivial binary classifier 17 August 9, 2005 ICML2005

  18. 1. ROC Background 2. Abstaining Classifier Cost-Based Cost Matrices – Interesting Cases Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! How to set c 13 , c 23 so that the classifier is a non- trivial abstaining classifier? ! Two interesting cases – Symmetric case ( c 13 =c 23 ) c c = ≤ 12 21 c c + 13 23 c c 21 12 – Proportional case (c 13 / c 23 = c 12 / c 21 ) c c ≤ ⇔ ≤ 12 21 c c 13 23 2 2 18 August 9, 2005 ICML2005

  19. 1. ROC Background 2. Abstaining Classifier Cost-Based Bounded Models Bounded-Abstention Bounded-Improvement 3. Experiments, Results 4. Summary ! Problem: 2x3 cost matrix is not always given and would have to be estimated. However, classifier is very sensitive to c 13 , c 23 . ! Finding other optimization criteria for an abstaining classifier using a standard cost matrix. – Calculate misclassification costs per classified instance ! Follow the same reasoning to find the optimal classifier 19 August 9, 2005 ICML2005

Recommend


More recommend