Support Vector A lgorithms for Optimizing the Partial A rea Under the ROC Curve Harikrishna Narasimhan Department of Computer Science and Automation Indian Institute of Science, Bangalore Joint work with Shivani Agarwal, IISc; Mitra Biotech team
allspammedup.com
pascal-network.org allspammedup.com
pascal-network.org allspammedup.com fusionsedge.com
pascal-network.org allspammedup.com fusionsedge.com optimum7.com
Spa pam or or Model No Non-sp spam am?
Receive eiver r Ope pera rating g Charac acte teristic ic Curve ve 1 ive Rate ue Positive True 0 1 Fal alse e Positive ive Rate
Receive eiver r Ope pera rating g Charac acte teristic ic Curve ve 1 ive Rate ue Positive Area a Und nder the ROC True Curve ve (AUC) 0 1 Fal alse e Positive ive Rate
Partial A UC? Ful ull l A UC UC
Partial A UC? Vs Ful ull l A UC UC Parti tial al A UC UC
Ranking
Ranking
Biometric Screening
Biometric Screening
Medical Diagnosis http://en.wikipedia.org/
Medical Diagnosis KDD Cup 2008 http://en.wikipedia.org/
Bioinformatics ― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― …… http://en.wikipedia.org/wiki http://commons.wikimedia.org/ http://www.google.com/imghp
Bioinformatics ― Drug Discovery ― Gene Prioritization ― Protein Interaction Prediction ― …… http://en.wikipedia.org/wiki http://commons.wikimedia.org/ http://www.google.com/imghp
Partial A UC Optimization New support vector method for directly optimizing the partial AUC measure Narasimhan, H. and Agarwal, S. “ A structural SVM based approach for optimizing partial AUC ”, ICML 2013.
Partial A UC Optimization New support vector method for directly optimizing the partial AUC measure Based on an earlier structural SVM based approach for full AUC optimization (Joachims 2005; 2006) Narasimhan, H. and Agarwal, S. “ A structural SVM based approach for optimizing partial AUC ”, ICML 2013.
ROC Curve & A lgorithm Partial A UC e Rate ive True Positiv False e Positive ive Rate A pplication
Setting …….. x 1 + x 2 + x 3 + x m + Positive Instances Training …….. Set x 1 - x 2 - x 3 - x n - Negative Instances
Setting …….. x 1 + x 2 + x 3 + x m + Positive Instances Training …….. Set x 1 - x 2 - x 3 - x n - Negative Instances GOAL? Model
Model Spa pam or or Scor ore e thres resho hold ld & No Non-sp spam am? Mod odel
Model Spa pam or or Scor ore e thres resho hold ld & No Non-sp spam am? Mod odel 1 TPR 0 1 FPR
Receiver Operating Characteristic Curve Illustration 20 15 14 True Positives Assigned by 13 score model 11 9 8 6 5 3 False Positives 2 0
Receiver Operating Characteristic Curve Illustration 20 15 14 True Positives 13 11 9 8 6 5 3 False Positives 2 0
Receiver Operating Characteristic Curve Illustration 20 15 14 True Positives 13 11 9 8 6 5 3 False Positives 2 0
Receiver Operating Characteristic Curve Illustration 20 15 14 True Positives 13 11 9 8 6 5 3 False Positives 2 0
Receiver Operating Characteristic Curve Illustration 20 15 14 Area Under the True Positives ROC Curve 13 (AUC) 11 9 Joachims (2005) 8 6 5 3 False Positives 2 0
Receiver Operating Characteristic Curve Illustration 20 15 14 Area Under the True Positives ROC Curve 13 (AUC) 11 9 Joachims (2005) 8 6 5 3 False Positives 2 0 Partial AUC
Observation 1: Best ROC Curve + + + True Positives + + + – – – – False Positives – –
Observation 2: Worst ROC Curve – – – True Positives – – – + + + + False Positives + +
Observation 3: Top Fraction of Negatives 20 15 14 True Positives 13 11 9 8 6 5 3 False Positives 2 0
Observation 3: Top Fraction of Negatives 20 15 14 Score Model True Positives 13 11 ? 9 8 6 5 3 False Positives 2 0
ROC Curve & A lgorithm Partial A UC e Rate ive True Positiv False e Positive ive Rate A pplication
Classification SVM
SVM for Full A UC Higher score to A than B A B
SVM for Full A UC Score Model
SVM for Full A UC Score Model
SVM for Partial A UC Score Model True Positives False Positives
SVM for Partial A UC Score Model
SVM for Partial A UC Score Model
SVM for Partial A UC GOAL? Score Model Structural SVM
SVM for Partial A UC
SVM for Partial A UC Ordering of examples in training set n 0 0 0 0 0 1 1 0 0 0 m 1 1 0 0 1 1 1 0 0 1
SVM for Partial A UC Ordering of examples in training set n 0 0 0 0 0 0 0 0 0 0 compared 1 1 0 0 0 0 0 0 0 0 IDEAL m with 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0
SVM for Partial A UC Ordering of examples in training set n 0 0 0 0 0 0 0 0 0 0 compared 1 1 0 0 0 0 0 0 0 0 IDEAL m with 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0
SVM for Partial A UC Ordering of examples in training set n 0 0 0 0 0 0 0 0 0 0 compared 1 1 0 0 0 0 0 0 0 0 IDEAL m with 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 Upper Bound on (1 – pAUC) pAUC Loss
Cutting-plane Solver Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Cutting-plane Solver Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Cutting-plane Solver Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint. Break down!
Cutting-plane Solver Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint. Break down! Full AUC 0 1 0 1 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0 1
Cutting-plane Solver Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint. Break down! Optimize rows Full AUC Partial AUC independently 0 1 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1
Can be implemented in O ( (m+n) log (m+n) ) time Cutting-plane Solver complexity Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint. Break down! Optimize rows Full AUC Partial AUC independently 0 1 0 1 0 0 1 0 1 0 1 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 1 1 0 0 1 H. Narasimhan and S. Agarwal. A Structural SVM Based Approach for Optimizing Partial AUC . ICML, 2013.
Experimental Results • Baseline Methods: – Full AUC Optimization (Joachims, 2005) Vs
Experimental Results • Baseline Methods: – Full AUC Optimization (Joachims, 2005) Vs – Asymmetric SVM (Wu et al., 2008) – Boosting Style Method (Komori & Eguchi, 2010) – Greedy Heuristic Method (Ricamato & Tortorella, 2011)
Experimental Results Drug Discovery 50 active compounds / 2092 inactive compounds Partial AUC in [0, 0.1] SVMpAUC 65.25 SVM-AUC 62.64 Interval [0, 0.1] ASVM 63.80 pAUCBoost 43.89 Greedy Heuristic 8.33
Experimental Results Protein-Protein Interaction Prediction ~3x10 3 interacting pairs / ~2x10 5 non-interacting pairs Partial AUC in [0, 0.1] SVMpAUC 51.79 SVM-AUC 39.72 Interval [0, 0.1] ASVM 44.51 pAUCBoost 48.65 Greedy Heuristic 47.33
Experimental Results KDD Cup 2008 Breast Cancer Detection ~600 malignant ROIs / ~10 5 benign ROIs Partial AUC in [0.2s, 0.3s] SVMpAUC 51.44 Interval [ α , β ] SVM-AUC 50.50 pAUCBoost 48.06 Greedy Heuristic 46.99
Experimental Results Run un Time A na naly lysi sis Cutting-plane Method Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint.
Experimental Results Run un Time A na naly lysi sis Cutting-plane Method Repeat: 1. Solve OP for a subset of constraints. 2. Add the most violated constraint. Time taken per iteration Total number of iterations
Experimental Results Run un Time A na naly lysi sis Total number of iterations
Experimental Results Run un Time A na naly lysi sis Time taken per iteration
Improved Formulation Narasimhan, H. and Agarwal, S. “ SVM_pAUC^tight: A new support vector method for optimizing partial AUC based on a tight convex upper bound” , KDD 2013.
Improved Formulation • Better Formulation: Tighter Approximation Narasimhan, H. and Agarwal, S. “ SVM_pAUC^tight: A new support vector method for optimizing partial AUC based on a tight convex upper bound” , KDD 2013.
Improved Formulation • Better Formulation: Tighter Approximation – Improved Accuracy – Better Run-time Guarantee Narasimhan, H. and Agarwal, S. “ SVM_pAUC^tight: A new support vector method for optimizing partial AUC based on a tight convex upper bound” , KDD 2013.
ROC Curve & A lgorithm Partial A UC e Rate ive True Positiv False e Positive ive Rate A pplication
Recommend
More recommend