c ost s ensitive m easures of i nstance h ardness
play

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo - PowerPoint PPT Presentation

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudncio Centro de Informtica UFPE Recife-Brazil I NTRODUCTION Instance hardness Which instances are more difficult in a dataset? Motivation Data


  1. C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudêncio Centro de Informática – UFPE Recife-Brazil

  2. I NTRODUCTION  Instance hardness  Which instances are more difficult in a dataset?  Motivation  Data cleaning, ensemble methods,...  Aspects considered in our work  Misclassification costs  Decision thresholds choice methods

  3. Context B Context A Cost of FN > Cost of FP Cost of FN = Cost of FP Instance hardness depends on the observed context (misclassification costs) and how to deal with it (decision threshold choice method)

  4. D EVELOPED W ORK  Framework to define cost curves and hardness measures for instances Instance x  Questions:  Given a context and an algorithm, how Loss hard is an instance?  How hard is an instance in general?  Which algorithm is the best for each instance?  Different curves for different decision threshold choice methods

  5. N OTATION AND BASIC DEFINITIONS  Instances can be either positive (y = 0) or negative (y = 1)  Learned model m is a scoring function  s = m(x) is high for negative instances ^ s y y  Decision Threshold (t) 0.92 1 1 0.71 1 0 0.54 1 0 t = 0.5 1, if s > t (i.e., x is negative) 0.36 0 0 ^ 0.21 0 1 y = 0, otherwise (i.e., x is positive)

  6. I NSTANCE COST CURVES  Cost model :      ( , ) 2 { ( ) ( 1 ) ( )} Q t c c FN t c FP t 0 1  ( , , ) 2 ( , ) QI x t c c f x t  Positive instances n   ( , , ) 2 ( 1 ) ( , ) QI x t c c f x t  Negative instances p

  7. I NSTANCE C OST C URVES - S CORE - DRIVEN THRESHOLD  Threshold is set equal to the cost proportion  t = T(c) = c  ( , ( ), ) 2 ( , ) QI x T c c c f x t n c = 0.4  0 . 54 c Higher cost for false positives  ( , ) 1 f n x t QI ^ s y y 0.92 1 1 0.71 1 0 x 0.54 1 0 t = 0.4 0.36 0 0 1 0 0.54 0.21 0 1 c

  8. I NSTANCE HARDNESS - S CORE - DRIVEN THRESHOLD  Instance cost curves (positive instances)  ( , ( ), ) 2 ( , ) QI x T c c c f x t n QI  ( ) IH x 2s  s  2 0 2 cdc s     2 2 s ( 0 ) ( ) s y s 1 c IH is the square error

  9. I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD  Threshold is equal to a desired rate of positive predictions R(t)  t = T(c) = R -1 (c)   0 . 4 0 . 6 c c  R = 0.80 (80% of positive  ( , ) 0 f n x t ( , ) 1 f n x t QI predictions) ^ s y y 0.92 1 1 0.71 0 0 R(0.54) = 0.60 0.54 0 0 1 0.4 0.6 x 0.36 0 0 R(0.36) = 0.40 0.21 0 1

  10. I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD  Instance cost curves (positive instances)  ( , , ) 2 ( , ) QI x t c c f x t n c  ( s ) R    1 l QI ( , ) 0 f n x t      2 ( ) ( ) ( ) IH x R s R s   3 n n  2 ( ) ( ) IH x R s R(s)   1 n c IH is the square positive rate

  11. I LLUSTRATIVE E XAMPLE x m 1 (x) y x 1 0.92 1 x 2 0.71 1 IH SD = (0 - 0.34) 2 = (0.34) 2 x 3 0.34 0 IH RD = (0.7) 2 x 4 0.31 1 x 5 0.23 1 x 6 0.20 1 x 7 0.15 0 x 8 0.13 0 Well calibrated score but poor rank x 9 0.11 1 x 10 0.05 0

  12. E NSEMBLE I NSTANCE H ARDNESS  Average cost curves and instances hardness over a pool L of learning models | | L 1   ( , , ) ( , , ) QI x t c QI x t c j | | L  1 j | | L 1   ( ) ( ) IH x IH j x | | L  1 j Strong assumption: all learning models are equally probable and reliable

  13. I LLUSTRATIVE E XAMPLE - E NSEMBLE H ARDNESS Ensemble instance cost curves for the positive instances x 5 x 6 x 7 x 8 x 9 x 10 Score-Driven Rate-Driven

  14. I LLUSTRATIVE E XAMPLE - C LASS H ARDNESS Positive Class Negative Class Score-Driven Rate-Driven

  15. C ONCLUSION  Instance hardness measures and cost curves considering different scenarios  Other threshold choice methods  Probabilistic methods (rate-uniform and score- uniform), rate-fixed and score-fixed.  Future work  Integrate instance hardness into classification methods (ensemble learning)  Empirical and meta-learning studies

  16. C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Questions???

Recommend


More recommend