learning as loss minimization
play

Learning as Loss Minimization Machine Learning 1 Learning as loss - PowerPoint PPT Presentation

Learning as Loss Minimization Machine Learning 1 Learning as loss minimization The setup Examples x drawn from a fixed, unknown distribution D Hidden oracle classifier f labels examples We wish to find a hypothesis h that mimics f


  1. Learning as Loss Minimization Machine Learning 1

  2. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 2

  3. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 3

  4. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss • Instead, minimize empirical loss on the training set 4

  5. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 5

  6. Learning as loss minimization • The setup – Examples x drawn from a fixed, unknown distribution D – Hidden oracle classifier f labels examples – We wish to find a hypothesis h that mimics f • The ideal situation – Define a function L that penalizes bad hypotheses – Learning: Pick a function h 2 H to minimize expected loss But distribution D is unknown • Instead, minimize empirical loss on the training set 6

  7. Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? 7

  8. Empirical loss minimization Learning = minimize empirical loss on the training set Is there a problem here? Overfitting! We need something that biases the learner towards simpler hypotheses • Achieved using a regularizer, which penalizes complex hypotheses 8

  9. Regularized loss minimization • Learning: • With linear classifiers: • What is a loss function? – Loss functions should penalize mistakes – We are minimizing average loss over the training data • What is the ideal loss function for classification? 9

  10. Regularized loss minimization • Learning: • With linear classifiers: • What is a loss function? – Loss functions should penalize mistakes – We are minimizing average loss over the training data • What is the ideal loss function for classification? 10

  11. Regularized loss minimization • Learning: • With linear classifiers: • What is a loss function? – Loss functions should penalize mistakes – We are minimizing average loss over the training data • What is the ideal loss function for classification? 11

  12. The 0-1 loss Penalize classification mistakes between true label y and prediction y’ • For linear classifiers, the prediction y’ = sgn( w T x) – Mistake if y w T x · 0 Minimizing 0-1 loss is intractable. Need surrogates 12

  13. The 0-1 loss Loss y w T x < 0, misclassification y w T x > 0, no misclassification y w T x 13

  14. Compare to the hinge loss Loss More penalty as w T x is farther away from the separator on the wrong side y w T x < 0, misclassification Penalize predictions even if they are correct, but too close to the margin y w T x > 0, no misclassification y w T x 14

  15. Support Vector Machines • SVM = linear classifier combined with regularization • Ideally, we would like to minimize 0-1 loss, – But we can’t for computational reasons • SVM minimizes hinge loss – Variants exist 15

  16. SVM objective function Regularization term: Empirical Loss: Maximize the margin Hinge loss • • Imposes a preference over the Penalizes weight vectors that make • • hypothesis space and pushes for mistakes better generalization Can be replaced with other Can be replaced with other loss • • regularization terms which impose functions which impose other other preferences preferences 16

  17. SVM objective function Regularization term: Empirical Loss: Maximize the margin Hinge loss • • Imposes a preference over the Penalizes weight vectors that make • • hypothesis space and pushes for mistakes better generalization Can be replaced with other Can be replaced with other loss • • regularization terms which impose functions which impose other other preferences preferences A hyper-parameter that controls the tradeoff between a large margin and a small hinge-loss 17

  18. The loss function zoo Many loss functions exist – Perceptron loss – Hinge loss (SVM) – Exponential loss (AdaBoost) – Logistic loss (logistic regression) 18

  19. The loss function zoo 19

  20. The loss function zoo Zero-one 20

  21. The loss function zoo Hinge: SVM Zero-one 21

  22. The loss function zoo Hinge: SVM Perceptron Zero-one 22

  23. The loss function zoo Hinge: SVM Exponential: AdaBoost Perceptron Zero-one 23

  24. The loss function zoo Hinge: SVM Exponential: AdaBoost Perceptron Zero-one Logistic regression 24

  25. Learning via Loss Minimization: Summary • Learning via Loss Minimization – Write down a loss function – Minimize empirical loss • Regularize to avoid overfitting – Neural networks use other strategies such as dropout • Widely applicable, different loss functions and regularizers 25

Recommend


More recommend