C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo - PowerPoint PPT Presentation

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudêncio Centro de Informática – UFPE Recife-Brazil

I NTRODUCTION  Instance hardness  Which instances are more difficult in a dataset?  Motivation  Data cleaning, ensemble methods,...  Aspects considered in our work  Misclassification costs  Decision thresholds choice methods

Context B Context A Cost of FN > Cost of FP Cost of FN = Cost of FP Instance hardness depends on the observed context (misclassification costs) and how to deal with it (decision threshold choice method)

D EVELOPED W ORK  Framework to define cost curves and hardness measures for instances Instance x  Questions:  Given a context and an algorithm, how Loss hard is an instance?  How hard is an instance in general?  Which algorithm is the best for each instance?  Different curves for different decision threshold choice methods

N OTATION AND BASIC DEFINITIONS  Instances can be either positive (y = 0) or negative (y = 1)  Learned model m is a scoring function  s = m(x) is high for negative instances ^ s y y  Decision Threshold (t) 0.92 1 1 0.71 1 0 0.54 1 0 t = 0.5 1, if s > t (i.e., x is negative) 0.36 0 0 ^ 0.21 0 1 y = 0, otherwise (i.e., x is positive)

I NSTANCE COST CURVES  Cost model :      ( , ) 2 { ( ) ( 1 ) ( )} Q t c c FN t c FP t 0 1  ( , , ) 2 ( , ) QI x t c c f x t  Positive instances n   ( , , ) 2 ( 1 ) ( , ) QI x t c c f x t  Negative instances p

I NSTANCE C OST C URVES - S CORE - DRIVEN THRESHOLD  Threshold is set equal to the cost proportion  t = T(c) = c  ( , ( ), ) 2 ( , ) QI x T c c c f x t n c = 0.4  0 . 54 c Higher cost for false positives  ( , ) 1 f n x t QI ^ s y y 0.92 1 1 0.71 1 0 x 0.54 1 0 t = 0.4 0.36 0 0 1 0 0.54 0.21 0 1 c

I NSTANCE HARDNESS - S CORE - DRIVEN THRESHOLD  Instance cost curves (positive instances)  ( , ( ), ) 2 ( , ) QI x T c c c f x t n QI  ( ) IH x 2s  s  2 0 2 cdc s     2 2 s ( 0 ) ( ) s y s 1 c IH is the square error

I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD  Threshold is equal to a desired rate of positive predictions R(t)  t = T(c) = R -1 (c)   0 . 4 0 . 6 c c  R = 0.80 (80% of positive  ( , ) 0 f n x t ( , ) 1 f n x t QI predictions) ^ s y y 0.92 1 1 0.71 0 0 R(0.54) = 0.60 0.54 0 0 1 0.4 0.6 x 0.36 0 0 R(0.36) = 0.40 0.21 0 1

I NSTANCE HARDNESS - R ATE - DRIVEN THRESHOLD  Instance cost curves (positive instances)  ( , , ) 2 ( , ) QI x t c c f x t n c  ( s ) R    1 l QI ( , ) 0 f n x t      2 ( ) ( ) ( ) IH x R s R s   3 n n  2 ( ) ( ) IH x R s R(s)   1 n c IH is the square positive rate

I LLUSTRATIVE E XAMPLE x m 1 (x) y x 1 0.92 1 x 2 0.71 1 IH SD = (0 - 0.34) 2 = (0.34) 2 x 3 0.34 0 IH RD = (0.7) 2 x 4 0.31 1 x 5 0.23 1 x 6 0.20 1 x 7 0.15 0 x 8 0.13 0 Well calibrated score but poor rank x 9 0.11 1 x 10 0.05 0

E NSEMBLE I NSTANCE H ARDNESS  Average cost curves and instances hardness over a pool L of learning models | | L 1   ( , , ) ( , , ) QI x t c QI x t c j | | L  1 j | | L 1   ( ) ( ) IH x IH j x | | L  1 j Strong assumption: all learning models are equally probable and reliable

I LLUSTRATIVE E XAMPLE - E NSEMBLE H ARDNESS Ensemble instance cost curves for the positive instances x 5 x 6 x 7 x 8 x 9 x 10 Score-Driven Rate-Driven

I LLUSTRATIVE E XAMPLE - C LASS H ARDNESS Positive Class Negative Class Score-Driven Rate-Driven

C ONCLUSION  Instance hardness measures and cost curves considering different scenarios  Other threshold choice methods  Probabilistic methods (rate-uniform and score- uniform), rate-fixed and score-fixed.  Future work  Integrate instance hardness into classification methods (ensemble learning)  Empirical and meta-learning studies

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Questions???

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo - PowerPoint PPT Presentation

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudncio Centro de Informtica UFPE Recife-Brazil I NTRODUCTION Instance hardness Which instances are more difficult in a dataset? Motivation Data

Community Funded Traffic M easures Marc Samways Traffic & Safety Manager Hampshire County

High - S ensitive T roponin in the E valuation of patients with A cute C oronary S yndrome (

EVALUATI ON OF SEMI - AUTOMATED ONTOLOGY I NSTANCE MI GRATI ON Maxim Davidovsky Zaporozhye

S TATISTICAL M ODELING OF S POT I NSTANCE P RICES IN P UBLIC C LOUD E NVIRONMENTS Bahman Javadi ,

OST OST UNITED STATES DEPARTMENT OF THE INTERIOR UNITED STATES DEPARTMENT OF THE --- INTERIOR

Lectur Lecture 20: e 20: DC M DC Motor otors Exam Exam 2 Results 2 Results Most M ost

Energy Performance o Buildings Direc Packages I III 4 5 6 3 1 C VG (T): G lobal E

2014 H EALTH C ARE C OST T RENDS H EARING P ANEL 1 M EETING THE C OST G ROWTH B ENCHMARK P ANEL 2 A

OST-HMD Optical Medium LCD, DLP, etc. Eye (camera) OST-HMD Eye (camera) 1 2 3 4 5 6 7

Measurem ent properties of eczem a- Fakultt Medizin specific m easures of health-related

Spain: Policy m easures tow ards a sustained and balanced grow th path February, 2011 Highlights

M ichigan Petroleum Storage Tank Conference October 23, 2019 STEVE GAL VAN Sr. Field M

M easures of A cademic P rogress N orth w est E ducation A ssociation Bernards Twp. BOE April 9,

W HAT ARE P OST - C ONSTRUCTION BMP S ? Permanent storm water management practices and site

Automating Schema Migrations with GitHub Actions, skeema & gh-ost Shlomi Noach GitHub

a stepped-wedge cluster-randomised controlled trial Professor Nicholas L Mills on behalf of the

Optimization Models for Container Inspection Endre Boros RUTCOR, Rutgers University Joint work

Logistic Regression and Decision Trees Reminders Project Part B was due yesterday Project

Intelligent Massive NOMA towards 6G Tutorials of PIMRC2020, London, UK Dr. Yuanwei Liu, Prof.

Channel Equalisation Graham C. Goodwin Day 5: Lecture 4 17th September 2004 International

Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Decision trees

Polynomial bounds for decoupling, with applica8ons Ryan ODonnell, Yu Zhao Carnegie Mellon

Elastically Decoupling Relic (ELDER) Dark Matter Maxim Perelstein, Cornell U.S. Cosmic Visions:

tr tt sqs

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo - PowerPoint PPT Presentation

C OST -S ENSITIVE M EASURES OF I NSTANCE H ARDNESS Carlos Melo Ricardo Prudncio Centro de Informtica UFPE Recife-Brazil I NTRODUCTION Instance hardness Which instances are more difficult in a dataset? Motivation Data

Community Funded Traffic M easures Marc Samways Traffic &amp; Safety Manager Hampshire County

High - S ensitive T roponin in the E valuation of patients with A cute C oronary S yndrome (

EVALUATI ON OF SEMI - AUTOMATED ONTOLOGY I NSTANCE MI GRATI ON Maxim Davidovsky Zaporozhye

S TATISTICAL M ODELING OF S POT I NSTANCE P RICES IN P UBLIC C LOUD E NVIRONMENTS Bahman Javadi ,

OST OST UNITED STATES DEPARTMENT OF THE INTERIOR UNITED STATES DEPARTMENT OF THE --- INTERIOR

Lectur Lecture 20: e 20: DC M DC Motor otors Exam Exam 2 Results 2 Results Most M ost

Energy Performance o Buildings Direc Packages I III 4 5 6 3 1 C VG (T): G lobal E

2014 H EALTH C ARE C OST T RENDS H EARING P ANEL 1 M EETING THE C OST G ROWTH B ENCHMARK P ANEL 2 A

OST-HMD Optical Medium LCD, DLP, etc. Eye (camera) OST-HMD Eye (camera) 1 2 3 4 5 6 7

Measurem ent properties of eczem a- Fakultt Medizin specific m easures of health-related

Spain: Policy m easures tow ards a sustained and balanced grow th path February, 2011 Highlights

M ichigan Petroleum Storage Tank Conference October 23, 2019 STEVE GAL VAN Sr. Field M

M easures of A cademic P rogress N orth w est E ducation A ssociation Bernards Twp. BOE April 9,

W HAT ARE P OST - C ONSTRUCTION BMP S ? Permanent storm water management practices and site

Automating Schema Migrations with GitHub Actions, skeema &amp; gh-ost Shlomi Noach GitHub

a stepped-wedge cluster-randomised controlled trial Professor Nicholas L Mills on behalf of the

Optimization Models for Container Inspection Endre Boros RUTCOR, Rutgers University Joint work

Logistic Regression and Decision Trees Reminders Project Part B was due yesterday Project

Intelligent Massive NOMA towards 6G Tutorials of PIMRC2020, London, UK Dr. Yuanwei Liu, Prof.

Channel Equalisation Graham C. Goodwin Day 5: Lecture 4 17th September 2004 International

Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Decision trees

Polynomial bounds for decoupling, with applica8ons Ryan ODonnell, Yu Zhao Carnegie Mellon

Elastically Decoupling Relic (ELDER) Dark Matter Maxim Perelstein, Cornell U.S. Cosmic Visions:

tr tt sqs

Community Funded Traffic M easures Marc Samways Traffic & Safety Manager Hampshire County

Automating Schema Migrations with GitHub Actions, skeema & gh-ost Shlomi Noach GitHub