and Evaluation CMSC 678 UMBC Central Question: How Well Are We - PowerPoint PPT Presentation

Experimental Setup, Multi-class vs. Multi-label classification, and Evaluation CMSC 678 UMBC

Central Question: How Well Are We Doing? • Precision, Recall, F1 • Accuracy • Log-loss Classification • ROC-AUC • … • (Root) Mean Square Error • Mean Absolute Error Regression • … Clustering • Mutual Information • V-score • the task : what kind … of problem are you solving?

Central Question: How Well Are We Doing? • Precision, This does Recall, F1 • Accuracy not have to • Log-loss be the same Classification • ROC-AUC thing as the • … loss • (Root) Mean Square Error function • Mean Absolute Error Regression • you … optimize Clustering • Mutual Information • V-score • the task : what kind … of problem are you solving?

Outline Experimental Design: Rule 1 Multi-class vs. Multi-label classification Evaluation Regression Metrics Classification Metrics

Experimenting with Machine Learning Models All your data Dev Test Training Data Data Data

Rule #1

Experimenting with Machine Learning Models What is “correct?” What is working “well?” Dev Test Training Data Data Data Learn model parameters from set hyper- training set parameters

Experimenting with Machine Learning Models What is “correct?” What is working “well?” Evaluate the learned model on dev with that hyperparameter setting Dev Test Training Data Data Data Learn model parameters from set hyper- training set parameters

Experimenting with Machine Learning Models What is “correct?” What is working “well?” Evaluate the learned model on dev with that hyperparameter setting Dev Test Training Data Data Data Learn model parameters from perform final evaluation on test, set hyper- training set using the hyperparameters that parameters optimized dev performance and retraining the model

Experimenting with Machine Learning Models What is “correct?” What is working “well?” Evaluate the learned model on dev with that hyperparameter setting Dev Test Training Data Data Data Learn model parameters from perform final evaluation on test, set hyper- training set using the hyperparameters that parameters optimized dev performance and retraining the model Rule 1: DO NOT ITERATE ON THE TEST DATA

On-board Exercise Produce dev and test tables for a linear regression model with learned weights and set/fixed (non-learned) bias

Outline Experimental Design: Rule 1 Multi-class vs. Multi-label classification Evaluation Regression Metrics Classification Metrics

Multi-class Classification Given input 𝑦 , predict discrete label 𝑧 Multi-label Classification

Multi-class Classification Given input 𝑦 , predict discrete label 𝑧 If 𝑧 ∈ {0,1} (or 𝑧 ∈ {True, False} ), then a binary classification task Multi-label Classification

Multi-class Classification Given input 𝑦 , predict discrete label 𝑧 If 𝑧 ∈ {0,1} (or 𝑧 ∈ If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for {True, False} ), then a finite K), then a multi-class binary classification task classification task Q: What are some examples of multi-class classification? Multi-label Classification

Multi-class Classification Given input 𝑦 , predict discrete label 𝑧 If 𝑧 ∈ {0,1} (or 𝑧 ∈ If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for {True, False} ), then a finite K), then a multi-class binary classification task classification task Q: What are some examples A: Many possibilities. See of multi-class classification? A2, Q{1,2,4-7} Multi-label Classification

Multi-class Classification Given input 𝑦 , predict discrete label 𝑧 If 𝑧 ∈ {0,1} (or 𝑧 ∈ If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for Single {True, False} ), then a finite K), then a multi-class output binary classification task classification task If multiple 𝑧 𝑚 are Multi- predicted, then a multi- output label classification task Multi-label Classification

Multi-class Classification Given input 𝑦 , predict discrete label 𝑧 If 𝑧 ∈ {0,1} (or 𝑧 ∈ If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for Single {True, False} ), then a finite K), then a multi-class output binary classification task classification task If multiple 𝑧 𝑚 are Multi- predicted, then a multi- output label classification task Given input 𝑦 , predict multiple discrete labels 𝑧 = (𝑧 1 , … , 𝑧 𝑀 ) Multi-label Classification

Multi-class Classification Given input 𝑦 , predict discrete label 𝑧 If 𝑧 ∈ {0,1} (or 𝑧 ∈ If 𝑧 ∈ {0,1, … , 𝐿 − 1} (for Single {True, False} ), then a finite K), then a multi-class output binary classification task classification task If multiple 𝑧 𝑚 are Each 𝑧 𝑚 could be binary or Multi- predicted, then a multi- multi-class output label classification task Given input 𝑦 , predict multiple discrete labels 𝑧 = (𝑧 1 , … , 𝑧 𝑀 ) Multi-label Classification

Multi- Label Classification… Will not be a primary focus of this course Many of the single output classification methods apply to multi-label classification Predicting “in the wild” can be trickier Evaluation can be trickier

We’ve only developed binary classifiers so far… Option 1: Develop a multi- class version Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier (there can be others)

We’ve only developed binary classifiers so far… Loss function may (or may not) Option 1: Develop a multi- need to be extended & the class version model structure may need to change (big or small) Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier (there can be others)

We’ve only developed binary classifiers so far… Loss function may (or may not) Option 1: Develop a multi- need to be extended & the class version model structure may need to change (big or small) Option 2: Build a one-vs- all (OvA) classifier Common change: instead of a single weight vector Option 3: Build an all-vs- 𝑥 , keep a weight vector 𝑥 (𝑑) for all (AvA) classifier each class c Compute class specific scores, e.g., (there can be others) ෢ (𝑑) = 𝑥 (𝑑) 𝑈 𝑦 + 𝑐 (𝑑) 𝑧 𝑗

Multi-class Option 1: Linear Regression/Perceptron 𝑦 𝐱 𝑧 𝑧 = 𝐱 𝑈 𝑦 + 𝑐 output: if y > 0: class 1 else: class 2

Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View 𝑦 𝑦 𝑧 𝐱 𝟐 𝐱 𝑈 𝑦 + 𝑐 1 𝑧 1 = 𝐱 𝟐 𝑧 1 𝑧 𝑧 2 𝑈 𝑦 + 𝑐 2 𝑧 2 = 𝐱 𝟑 𝑧 = 𝐱 𝑈 𝑦 + 𝑐 𝐱 𝟑 output: output: if y > 0: class 1 i = argmax { y 1 , y 2 } else: class 2 class i binary version is special case

Multi-class Option 1: Linear Regression/Perceptron: A Per-Class View (alternative) 𝑦 𝑦 𝑧 𝐱 𝟐 𝑧 1 = 𝒙 𝟐 ; 𝒙 𝟑 𝑼 [𝑦; 𝟏] + 𝑐 1 𝐱 𝑧 1 𝑧 concatenation 𝑧 2 𝑧 2 = 𝒙 𝟐 ; 𝒙 𝟑 𝑼 [𝟏; 𝑦] + 𝑐 2 𝑧 = 𝐱 𝑈 𝑦 + 𝑐 𝐱 𝟑 output: output: i = argmax { y 1 , y 2 } if y > 0: class 1 class i else: class 2 Q : (For discussion) Why does this work?

We’ve only developed binary classifiers so far… With C classes: Option 1: Develop a multi- class version Train C different binary classifiers 𝛿 𝑑 (𝑦) Option 2: Build a one-vs- 𝛿 𝑑 (𝑦) predicts 1 if x is likely class c, all (OvA) classifier 0 otherwise Option 3: Build an all-vs- all (AvA) classifier (there can be others)

We’ve only developed binary classifiers so far… With C classes: Option 1: Develop a multi- class version Train C different binary classifiers 𝛿 𝑑 (𝑦) Option 2: Build a one-vs- 𝛿 𝑑 (𝑦) predicts 1 if x is likely class c, all (OvA) classifier 0 otherwise Option 3: Build an all-vs- To test/predict a new instance z : all (AvA) classifier Get scores 𝑡 𝑑 = 𝛿 𝑑 (𝑨) Output the max of these scores, (there can be others) 𝑧 = argmax 𝑑 𝑡 𝑑 ො

We’ve only developed binary classifiers so far… With C classes: Option 1: Develop a multi- class version Train 𝐷 2 different binary classifiers 𝛿 𝑑 1 ,𝑑 2 (𝑦) Option 2: Build a one-vs- all (OvA) classifier Option 3: Build an all-vs- all (AvA) classifier (there can be others)

We’ve only developed binary classifiers so far… With C classes: Option 1: Develop a multi- class version Train 𝐷 2 different binary classifiers 𝛿 𝑑 1 ,𝑑 2 (𝑦) Option 2: Build a one-vs- all (OvA) classifier 𝛿 𝑑 1 ,𝑑 2 (𝑦) predicts 1 if x is likely class 𝑑 1 , 0 otherwise (likely class 𝑑 2 ) Option 3: Build an all-vs- all (AvA) classifier (there can be others)

We’ve only developed binary classifiers so far… With C classes: Option 1: Develop a multi- class version Train 𝐷 2 different binary classifiers 𝛿 𝑑 1 ,𝑑 2 (𝑦) Option 2: Build a one-vs- all (OvA) classifier 𝛿 𝑑 1 ,𝑑 2 (𝑦) predicts 1 if x is likely class 𝑑 1 , 0 otherwise (likely class 𝑑 2 ) Option 3: Build an all-vs- all (AvA) classifier To test/predict a new instance z : Get scores or predictions 𝑡 𝑑 1 ,𝑑 2 = (there can be others) 𝛿 𝑑 1 ,𝑑 2 𝑨

and Evaluation CMSC 678 UMBC Central Question: How Well Are We - PowerPoint PPT Presentation

Experimental Setup, Multi-class vs. Multi-label classification, and Evaluation CMSC 678 UMBC Central Question: How Well Are We Doing? Precision, Recall, F1 Accuracy Log-loss Classification ROC-AUC

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

Less and More Min Chen, University of Oxford Evaluation: How Much Evaluation is Enough? Less and

SENSORY EVALUATION .. Basics of Sensory evaluation, Tools, Techniques, Methods and

Evaluation of cumulative impact of Evaluation of cumulative impact of Evaluation of cumulative

Overview of Overview of Evaluation in Evaluation in the UN Secretariat the UN Secretariat Prepared

Optimizing unit test execution in large software programs using dependency analysis Taesoo Kim,

Intrus ntrusion ion Det Detection, ection, Fi Fire rewalls, alls, an and d Intr ntrusion

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning tasks

The European Commissions science and knowledge service Joint Research Centre Why machine

Experiences of of La Landing Machine Le Learning onto Market-Scale Mobile Malware Detection

MA162: Finite mathematics . Jack Schmidt University of Kentucky December 3, 2012 Schedule:

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 http://xkcd.com/1236/ January 1, 2017

Introduction to Machine Learning Evaluation: Measures for Binary Classification: ROC

and Evaluation CMSC 678 UMBC Central Question: How Well Are We - PowerPoint PPT Presentation

Experimental Setup, Multi-class vs. Multi-label classification, and Evaluation CMSC 678 UMBC Central Question: How Well Are We Doing? Precision, Recall, F1 Accuracy Log-loss Classification ROC-AUC

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

Evidence evaluation for discrete data Evidence evaluation for discrete data Evidence evaluation

Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Evaluation Map Guide Progress on

Programme BRICK Programme Evaluation: How, why and what? The plan Practical evaluation -

Webinar on Meta-evaluation Approaches to Improve Evaluation Practice Mnica Lomea Gelis,

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation &amp; Analysis Lori

Evaluation DEMMS: Evaluation of Multimedia What are the Evaluation lectures about: When

Heuristic Evaluation (Pinelle) Heuristic evaluation is a method of qualitative evaluation of

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson &amp; Pyla UX Evaluation

Parser Evaluation and the BNC Standard Parser Evaluation The Parsers Jennifer Foster and Josef

Less and More Min Chen, University of Oxford Evaluation: How Much Evaluation is Enough? Less and

SENSORY EVALUATION .. Basics of Sensory evaluation, Tools, Techniques, Methods and

Evaluation of cumulative impact of Evaluation of cumulative impact of Evaluation of cumulative

Overview of Overview of Evaluation in Evaluation in the UN Secretariat the UN Secretariat Prepared

Optimizing unit test execution in large software programs using dependency analysis Taesoo Kim,

Intrus ntrusion ion Det Detection, ection, Fi Fire rewalls, alls, an and d Intr ntrusion

CSI5180. MachineLearningfor BioinformaticsApplications Fundamentals of Machine Learning tasks

The European Commissions science and knowledge service Joint Research Centre Why machine

Experiences of of La Landing Machine Le Learning onto Market-Scale Mobile Malware Detection

MA162: Finite mathematics . Jack Schmidt University of Kentucky December 3, 2012 Schedule:

Bayesian Updating: Discrete Priors: 18.05 Spring 2014 http://xkcd.com/1236/ January 1, 2017

Introduction to Machine Learning Evaluation: Measures for Binary Classification: ROC

Evaluation Update Laura Forsythe, PhD, MPH Associate Director, Evaluation & Analysis Lori

UX Evaluation SWEN-444 Selected material from The UX Book , Hartson & Pyla UX Evaluation