C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudêncio Centro de Informática – UFPE Recife-Brazil
I NTRODUCTION Instance hardness Which instances are more difficult in a dataset? Motivation Data cleaning, ensemble methods,... Aspects considered in our work Misclassification costs Decision thresholds choice methods
Context B Context A Cost of FN > Cost of FP Cost of FN = Cost of FP Instance hardness depends on the observed context (misclassification costs) and how to deal with it (decision threshold choice method)
D EVELOPED W ORK Framework to define cost curves and hardness measures for instances Instance x Questions: Given a context and an algorithm, how Loss hard is an instance? How hard is an instance in general? Which algorithm is the best for each instance? Different curves for different decision threshold choice methods
N OTATION AND BASIC DEFINITIONS Instances can be either positive (y = 0) or negative (y = 1) Learned model m is a scoring function s = m(x) is high for negative instances ^ s y y Decision Threshold (t) 0.92 1 1 0.71 1 0 0.54 1 0 t = 0.5 1, if s > t (i.e., x is negative) 0.36 0 0 ^ 0.21 0 1 y = 0, otherwise (i.e., x is positive)
I NSTANCE COST CURVES Cost model : ( , ) 2 { ( ) ( 1 ) ( )} Q t c c FN t c FP t 0 1 ( , , ) 2 ( , ) QI x t c c f x t Positive instances n ( , , ) 2 ( 1 ) ( , ) QI x t c c f x t Negative instances p
I NSTANCE C OST C URVES - S CORE - DRIVEN THRESHOLD Threshold is set equal to the cost proportion t = T(c) = c ( , ( ), ) 2 ( , ) QI x T c c c f x t n c = 0.4 0 . 54 c Higher cost for false positives ( , ) 1 f n x t QI ^ s y y 0.92 1 1 0.71 1 0 x 0.54 1 0 t = 0.4 0.36 0 0 1 0 0.54 0.21 0 1 c
I NSTANCE HARDNESS - S CORE - DRIVEN THRESHOLD Instance cost curves (positive instances) ( , ( ), ) 2 ( , ) QI x T c c c f x t n QI ( ) IH x 2s s 2 0 2 cdc s 2 2 s ( 0 ) ( ) s y s 1 c IH is the square error
I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD Threshold is equal to a desired rate of positive predictions R(t) t = T(c) = R -1 (c) 0 . 4 0 . 6 c c R = 0.80 (80% of positive ( , ) 0 f n x t ( , ) 1 f n x t QI predictions) ^ s y y 0.92 1 1 0.71 0 0 R(0.54) = 0.60 0.54 0 0 1 0.4 0.6 x 0.36 0 0 R(0.36) = 0.40 0.21 0 1
I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD Instance cost curves (positive instances) ( , , ) 2 ( , ) QI x t c c f x t n c ( s ) R 1 l QI ( , ) 0 f n x t 2 ( ) ( ) ( ) IH x R s R s 3 n n 2 ( ) ( ) IH x R s R(s) 1 n c IH is the square positive rate
I LLUSTRATIVE E XAMPLE x m 1 (x) y x 1 0.92 1 x 2 0.71 1 IH SD = (0 - 0.34) 2 = (0.34) 2 x 3 0.34 0 IH RD = (0.7) 2 x 4 0.31 1 x 5 0.23 1 x 6 0.20 1 x 7 0.15 0 x 8 0.13 0 Well calibrated score but poor rank x 9 0.11 1 x 10 0.05 0
E NSEMBLE I NSTANCE H ARDNESS Average cost curves and instances hardness over a pool L of learning models | | L 1 ( , , ) ( , , ) QI x t c QI x t c j | | L 1 j | | L 1 ( ) ( ) IH x IH j x | | L 1 j Strong assumption: all learning models are equally probable and reliable
I LLUSTRATIVE E XAMPLE - E NSEMBLE H ARDNESS Ensemble instance cost curves for the positive instances x 5 x 6 x 7 x 8 x 9 x 10 Score-Driven Rate-Driven
I LLUSTRATIVE E XAMPLE - C LASS H ARDNESS Positive Class Negative Class Score-Driven Rate-Driven
C ONCLUSION Instance hardness measures and cost curves considering different scenarios Other threshold choice methods Probabilistic methods (rate-uniform and score- uniform), rate-fixed and score-fixed. Future work Integrate instance hardness into classification methods (ensemble learning) Empirical and meta-learning studies
C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Questions???
Recommend
More recommend