visualization for classification
play

Visualization for Classification ROC, AUC, Confusion Matrix Duen - PowerPoint PPT Presentation

http://poloclub.gatech.edu/cse6242 CSE6242 / CX4242: Data & Visual Analytics Visualization for Classification ROC, AUC, Confusion Matrix Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics


  1. http://poloclub.gatech.edu/cse6242 
 CSE6242 / CX4242: Data & Visual Analytics 
 Visualization for Classification 
 ROC, AUC, Confusion Matrix Duen Horng (Polo) Chau 
 Assistant Professor 
 Associate Director, MS Analytics 
 Georgia Tech Parishit Ram 
 GT PhD alum; SkyTree Partly based on materials by 
 Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Parishit Ram (GT PhD alum; SkyTree), Alex Gray 1

  2. Visualizing Classification Performance Confusion matrix https://en.wikipedia.org/wiki/Confusion_matrix 2

  3. Hard to spot trends and patterns Easier 3 http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf

  4. Very important: 
 Find out what “positive” means https://en.wikipedia.org/wiki/Confusion_matrix 4

  5. Very important: 
 Find out what “positive” means https://en.wikipedia.org/wiki/Confusion_matrix 5

  6. Visualizing Classification Performance 
 using ROC curve 
 (Receiver Operating Characteristic)

  7. Polonium’s ROC Curve Positive class: malware Negative class: benign Ideal 85% True Positive Rate 
 1% False Alarms True Positive Rate % of bad correctly labeled False Positive Rate (False Alarms) % of good labeled as bad 7

  8. Measuring Classification Performance 
 using AUC (Area under the curve) Ideal 85% True Positive Rate 
 1% False Alarms

  9. If a machine learning algorithm achieves 0.9 AUC (out of 1.0) . 
 That’s a great algorithm, right? 9

  10. Be Careful with AUC! 10

  11. Weights in combined models Bagging / Random forests • Majority voting Let people play with the weights? 11

  12. EnsembleMatrix http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf 12

  13. Improving performance • Adjust the weights of the individual classifiers • Data partition to separate problem areas o Adjust weights just for these individual parts • State-of-the-art performance, on one dataset http://research.microsoft.com/en-us/um/redmond/groups/cue/publications/CHI2009-EnsembleMatrix.pdf 13

Recommend


More recommend