4 model evaluatjon selectjon
play

4. Model evaluatjon & selectjon Chlo-Agathe Azencot Centre for - PowerPoint PPT Presentation

Foundatjons of Machine Learning CentraleSuplec Fall 2017 4. Model evaluatjon & selectjon Chlo-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr Practjcal maters You should


  1. Foundatjons of Machine Learning CentraleSupélec — Fall 2017 4. Model evaluatjon & selectjon Chloé-Agathe Azencot Centre for Computatjonal Biology, Mines ParisTech chloe-agathe.azencott@mines-paristech.fr

  2. Practjcal maters ● You should have received an email from me on Tuesday ● Partjal solutjon to Lab 1 at the end of the slides of Chapter 3. ● Pointers/refreshers re: (scientjfjc) python – http://www.scipy-lectures.org/ – https://github.com/chagaz/ml-notebooks/ → lsml2017 ● Yes, I only put the slides online afuer the lecture.

  3. Generalizatjon A good and useful approximatjon ● It’s easy to build a model that performs well on the training data ● But how well will it perform on new data? ● “Predictjons are hard, especially about the future” — Niels Bohr. – Learn models that generalize well – Evaluate whether models generalize well. 3

  4. Noise in the data ● Imprecision in recording the features ● Errors in labeling the data points ( teacher noise ) ● Missing features ( hidden or latent ) ● Making no errors on the training set might not be possible. 4

  5. Models of increasing complexity 5

  6. Noise and model complexity ● Use simple models! – Easier to use lower computatjonal complexity – Easier to train lower space complexity – Easier to explain more interpretable – Generalize beter Occam’s razor: simpler explanatjons are more plausible. 6

  7. Overfjttjng ● What are the empirical errors of the black and purple classifjers? ● Which model seems more likely to be correct? 7

  8. Overfjttjng & Underfjttjng (Regression) Overfjttjng Underfjttjng 8

  9. Generalizatjon error vs. model complexity Underfjttjng Overfjttjng Predictjon error On new data On training data Model complexity 9

  10. Bias-variance tradeof ● Bias: difgerence between the expected value of the estjmator and the true value being estjmated. – A simpler model has a higher bias. – High bias can cause underfjttjng. ● Variance: deviatjon from the expected value of the estjmates. – A more complex model has a higher variance. – High variance can cause overfjttjng. 10

  11. Bias-variance decompositjon ● ● ● Mean squared error: ● Proof ? 11

  12. Bias-variance decompositjon ● ● ● Mean squared error: and y are determinist. 12

  13. Generalizatjon error vs. model complexity Low bias High bias High variance Low variance Prediction error On new data On training data Model complexity 13

  14. Model selectjon & generalizatjon ● Well-posed problems: – a solutjon exists; Hadamard, on the mathematical modelisation of physical phenomena. – it is unique; – the solutjon changes contjnuously with the initjal conditjons ● Learning is an ill-posed problem : data helps carve out the hypothesis space but data is not suffjcient to fjnd a unique solutjon. ● Need for inductjve bias assumptjons about the hypothesis space model selectjon: choose the “right” inductjve bias? 14

  15. How do we decide a model is good? 15

  16. Learning objectjves Afuer this lecture you should be able to design experiments to select and evaluate supervised machine learning models. Concepts: ● training and testjng sets; ● cross-validatjon; ● bootstrap; ● measures of performance for classifjers and regressors; ● measures of model complexity. 16

  17. Supervised learning settjng ● Training set: ● Classifjcatjon: ? ● Regression: ● Goal: Find such that ● Empirical error of f on the training set, given a loss : – E.g. (classifjcatjon) – E.g. (regression) 17

  18. Supervised learning settjng ● Training set: ● Classifjcatjon: ? ● Regression: ● Goal: Find such that ● Empirical error of f on the training set, given a loss : – E.g. (classifjcatjon) – E.g. (regression) 18

  19. Supervised learning settjng ● Training set: ● Classifjcatjon: ● Regression: ? ● Goal: Find such that ● Empirical error of f on the training set, given a loss : – E.g. (classifjcatjon) – E.g. (regression) 19

  20. Supervised learning settjng ● Training set: ● Classifjcatjon: ● Regression: ● Goal: Find such that ? ● Empirical error of f on the training set, given a loss : – E.g. (classifjcatjon) – E.g. (regression) 20

  21. Supervised learning settjng ● Training set: ● Classifjcatjon: ● Regression: ● Goal: Find such that ● Empirical error of f on the training set, given a loss : – E.g. (classifjcatjon) ? – E.g. (regression) 21

  22. Supervised learning settjng ● Training set: ● Classifjcatjon: ● Regression: ● Goal: Find such that ● Empirical error of f on the training set, given a loss : – E.g. (classifjcatjon) – E.g. (regression) ? 22

  23. Supervised learning settjng ● Training set: ● Classifjcatjon: ● Regression: ● Goal: Find such that ● Empirical error of f on the training set, given a loss : – E.g. (classifjcatjon) – E.g. (regression) 23

  24. Generalizatjon error ● The empirical error on the training set is a poor estjmate of the generalizatjon error (expected error on new data) If the model is overfjttjng, the generalizatjon error can be arbitrarily large. ● We would like to estjmate the generalizatjon error on new data, which we do not have. 24

  25. Validatjon sets ● Choose the model that performs best on a validatjon set separate from the training set . Training Validation ● Because we have not used the validatjon data at any point during training, the validatjon set can be considered “new data” and the error on the validatjon set is an estjmatjon of the generalizatjon error. 25

  26. Model selectjon ● What if we want to choose among k models? – Train each model on the train set – Compute the predictjon error of each model on the validatjon set – Pick the model with the smallest predictjon error on the validatjon set. ● What is the generalizatjon error? – We don’t know! – Validatjon data was used to select the model – We have “cheated” and looked at the validatjon data: it is not a good proxy for new, unseen data any more. 26

  27. Validatjon sets ● Hence we need to set aside part of the data, the test set, that remains untouched during the entjre procedure and on which we’ll estjmate the generalizatjon error. ● Model selectjon: pick the best model. ● Model assessment: estjmate its predictjon error on new data. Training Validation Test 27

  28. ● How much data should go in each of the training, validatjon and test sets? ● How do we know we have enough data to evaluate the predictjon and generalizatjon errors? ● Empirical evaluatjon with sample re-use – cross-validatjon – bootstrap ● Analytjcal tools – Mallow's Cp, AIC, BIC – MDL. 28

  29. Sample re-use 29

  30. Cross-validatjon ● Cut the training set in k separate folds. ● For each fold, train on the (k-1) remaining folds. Validation Training Validation Validation Training Training Validation Training Validation 30

  31. Cross-validated performance ● Cross-validatjon estjmate of the predictjon error Computed with the k(i)-th part of the data removed. k(i) = fold in which i is. or: Fold l ● Estjmates the expected predictjon error Y, X: (independent) test sample 31

  32. Issues with cross-validatjon ● Training set size becomes (K-1)n/K Why is this a problem? ? 32

  33. Issues with cross-validatjon ● Training set size becomes (K-1)n/K ⇒ – small training set biased estjmator of the error ● Leave-one-out cross-validatjon: K = n – approximately unbiased estjmator of the expected predictjon error – potentjal high variance (the training sets are very similar to each other) – computatjon can become burdensome (n repeats) ● In practjce: set K = 5 or K = 10. 33

  34. Bootstrap ● Randomly draw datasets with replacemen t from the training data ⇒ ● Repeat B tjmes (typically, B=100) B models ● Leave-one-out bootstrap error: – For each training point i, predict with the b i < B models that did not have i in their training set – Average predictjon errors ● Each training set contains ? 34

  35. Bootstrap ● Randomly draw datasets with replacemen t from the training data ⇒ ● Repeat B tjmes (typically, B=100) B models ● Leave-one-out bootstrap error: – For each training point i, predict with the b i < B models that did not have i in their training set – Average predictjon errors ● Each training set contains 0.632.n distjnct examples ⇒ same issue as with cross-validatjon 35

  36. Evaluatjng model performance 36

  37. Classifjcatjon model evaluatjon ● Confusion matrix True class -1 +1 -1 T rue N egatjves F alse N egatjves Predicted class +1 F alse P ositjves T rue P ositjves ● False positjves (false alarms) are also called type I errors ● False negatjves (misses) are also called type II errors 37

  38. ● Sensitjvity = Recall = True positjve rate (TPR) # positjves ● Specifjcity = True negatjve rate (TNR) ● Precision = Positjve predictjve value (PPV) # predicted positjves ● False discovery rate (FDR) 38

  39. ● Accuracy ● F1-score = harmonic mean of precision and sensitjvity. 39

Recommend


More recommend