Learning as Loss Minimization Machine Learning 1 Learning as loss - PowerPoint PPT Presentation

Learning as Loss Minimization Machine Learning 1

Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 2

Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss • Instead, minimize empirical loss on the training set 4

Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? 7

Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? Overfitting! We need something that biases the learner towards simpler hypotheses • Achieved using a regularizer, which penalizes complex hypotheses 8

Regularized loss minimization • Learning: • With linear classifiers: • What is a loss function? – Loss functions should penalize mistakes – We are minimizing average loss over the training data • What is the ideal loss function for classification? 9

The 0-1 loss Penalize classification mistakes between true label y and prediction y’ • For linear classifiers, the prediction y’ = sgn( w T x) – Mistake if y w T x · 0 Minimizing 0-1 loss is intractable. Need surrogates 12

The 0-1 loss Loss y w T x < 0, misclassification y w T x > 0, no misclassification y w T x 13

Compare to the hinge loss Loss More penalty as w T x is farther away from the separator on the wrong side y w T x < 0, misclassification Penalize predictions even if they are correct, but too close to the margin y w T x > 0, no misclassification y w T x 14

Support Vector Machines • SVM = linear classifier combined with regularization • Ideally, we would like to minimize 0-1 loss, – But we can’t for computational reasons • SVM minimizes hinge loss – Variants exist 15

SVM objective function Regularization term: Empirical Loss: Maximize the margin Hinge loss • • Imposes a preference over the Penalizes weight vectors that make • • hypothesis space and pushes for mistakes better generalization Can be replaced with other Can be replaced with other loss • • regularization terms which impose functions which impose other other preferences preferences 16

SVM objective function Regularization term: Empirical Loss: Maximize the margin Hinge loss • • Imposes a preference over the Penalizes weight vectors that make • • hypothesis space and pushes for mistakes better generalization Can be replaced with other Can be replaced with other loss • • regularization terms which impose functions which impose other other preferences preferences A hyper-parameter that controls the tradeoff between a large margin and a small hinge-loss 17

The loss function zoo Many loss functions exist – Perceptron loss – Hinge loss (SVM) – Exponential loss (AdaBoost) – Logistic loss (logistic regression) 18

The loss function zoo 19

The loss function zoo Zero-one 20

The loss function zoo Hinge: SVM Zero-one 21

The loss function zoo Hinge: SVM Perceptron Zero-one 22

The loss function zoo Hinge: SVM Exponential: AdaBoost Perceptron Zero-one 23

The loss function zoo Hinge: SVM Exponential: AdaBoost Perceptron Zero-one Logistic regression 24

Learning via Loss Minimization: Summary • Learning via Loss Minimization – Write down a loss function – Minimize empirical loss • Regularize to avoid overfitting – Neural networks use other strategies such as dropout • Widely applicable, different loss functions and regularizers 25

Learning as Loss Minimization Machine Learning 1 Learning as loss - PowerPoint PPT Presentation

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup Examples x drawn from a fixed, unknown distribution D Hidden oracle classifier f labels examples We wish to find a hypothesis h that mimics f

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

One-Dimensional Minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Minimization Using Descent Information we will consider the minimization of unconstrained

Cluster Minimization in Geometric Graphs Jakob Geiger Motivation Motivation Cluster

11. Equality constrained minimization equality constrained minimization eliminating

A Minimization Algorithm Consider the minimization problem: * M min M M * subject

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization

Moment methods in energy minimization David de Laat CWI Amsterdam Andrejewski-Tage Moment

Online Monitoring developments Dorota, Robert and Voica LArSoft modules for OnlineMonitoring

Mock Objects Maurcio F. Aniche M.F.Aniche@tudelft.nl Thats how it is in OO systems A

When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware

Getting Started with the SENCER-SALG SENCER Summer Institute 2018 Stephen Carroll Evaluation of

Jackstraws : Picking Command and Control Connections from Bot Traffic egoire Jacob 1 , Ralf Hund 2

Reading engines for Visual Narratives by Laurent Le Meur / EDRLab 18 September 2018 EDRLab

W HAT IS P LAIN E NGLISH ? According to the Palin English Campaign in Britain, it is the

Lessons from Fukushima August 7, 2012 David Lochbaum Director, Nuclear Safety Project Union of

Learning as Loss Minimization Machine Learning 1 Learning as loss - PowerPoint PPT Presentation

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup Examples x drawn from a fixed, unknown distribution D Hidden oracle classifier f labels examples We wish to find a hypothesis h that mimics f

Minimization Satoru Iwata (University of Tokyo) Submodular Function Minimization ( )

Introduction to Machine Learning ML-Basics: Losses &amp; Risk Minimization Learning goals Know

CHRONIC CHRONIC VISUAL LOSS VISUAL LOSS Wasu Supakornthanasarn, MD. Visual loss Sensory

Early Hearing Early Hearing Early Hearing loss D Early Hearing-loss D loss D loss D

Benefits of Radial Build Benefits of Radial Build Minimization and Requirements Minimization and

1 The Minimization Problem The Minimization Problem Input: A DFA (deterministic finite-state

Empirical Risk Minimization October 29, 2015 Outline Empirical risk minimization view

ARS Workshop Context Markov Random Fields minimization and minimal cuts in Exact total variation

Counterfactual Regret Minimization and Domination in Extensive-Form Games Richard Gibson

One-Dimensional Minimization Lectures for PHD course on Numerical optimization Enrico Bertolazzi

Minimization Using Descent Information we will consider the minimization of unconstrained

Cluster Minimization in Geometric Graphs Jakob Geiger Motivation Motivation Cluster

11. Equality constrained minimization equality constrained minimization eliminating

A Minimization Algorithm Consider the minimization problem: * M min M M * subject

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&amp;T Tabular Minimization

Moment methods in energy minimization David de Laat CWI Amsterdam Andrejewski-Tage Moment

Online Monitoring developments Dorota, Robert and Voica LArSoft modules for OnlineMonitoring

Mock Objects Maurcio F. Aniche M.F.Aniche@tudelft.nl Thats how it is in OO systems A

When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware

Getting Started with the SENCER-SALG SENCER Summer Institute 2018 Stephen Carroll Evaluation of

Jackstraws : Picking Command and Control Connections from Bot Traffic egoire Jacob 1 , Ralf Hund 2

Reading engines for Visual Narratives by Laurent Le Meur / EDRLab 18 September 2018 EDRLab

W HAT IS P LAIN E NGLISH ? According to the Palin English Campaign in Britain, it is the

Lessons from Fukushima August 7, 2012 David Lochbaum Director, Nuclear Safety Project Union of

Introduction to Machine Learning ML-Basics: Losses & Risk Minimization Learning goals Know

CENG 342 Digital Systems Tabular Minimization Larry Pyeatt SDSM&T Tabular Minimization