department of computer science csci 5622 machine learning
play

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm review 1 Theory PAC learning Bias-variance tradeoff Model selection Methods K-nearest neighbor Nave Bayes Linear


  1. Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm review 1

  2. • Theory • PAC learning • Bias-variance tradeoff • Model selection • Methods • K-nearest neighbor • Naïve Bayes • Linear regression • Regularization • Logistic regression • Neural networks • SVM • Multi-class classification • Feature engineering • Boosting 2

  3. Supervised learning 3

  4. PAC Learnability 4

  5. PAC Learnability 5

  6. PAC Learnability 6

  7. 7

  8. Generalization error bounds • Finite consistent hypothesis class • Finite inconsistent hypothesis class 8

  9. Finite Consistent Hypothesis Class 9

  10. Finite Inconsistent Hypothesis Class 10

  11. Bias-variance tradeoff 11

  12. 12

  13. 13

  14. 14

  15. 15

  16. 16

  17. • Theory • PAC learning • Bias-variance tradeoff • Model selection • Methods • K-nearest neighbor • Naïve Bayes • Linear regression • Regularization • Logistic regression • Neural networks • SVM • Multi-class classification • Feature engineering • Boosting 17

  18. •Methods • Model • Algorithm 18

  19. K-nearest neighbors 19

  20. K-nearest neighbors 20

  21. 21

  22. 22

  23. For text classification with Laplace smoothing: 23

  24. 24

  25. Linear regression • Data are continuous inputs and outputs 25

  26. Objective function (model) The objective function is called the residual sum of squares : 26

  27. Probabilistic interpretation A discriminative model that assumes the response Gaussian with mean 27

  28. Regularization 28

  29. Prior Distribution 29

  30. Prior Distribution • Lasso's prior peaked at 0 means expect many params to be zero • Ridge's prior flatter and fatter around 0 means we expect many coefficients to be smallish 30

  31. 31

  32. 32

  33. Logistic regression 33

  34. Neural networks 34

  35. Gradient descent 35

  36. Stochastic gradient descent 36

  37. Forward algorithm 37

  38. Backpropagation 38

  39. Neural network techniques • Momentum • Dropout • Batch normalization • Weight initialization 39

  40. Hard-margin SVM 40

  41. Soft-margin SVM 41

  42. KKT conditions 42

  43. KKT conditions 43

  44. SMO algorithm 44

  45. Kernels 45

  46. 46

  47. Feature engineering 47

  48. Multi-class Classification • Reduction • One-against-all • All-pairs 48

  49. One-against-all • Break k- class problem into k binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest confidence 49

  50. All-pairs • Break k- class problem into k(k-1)/2 binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest sum confidence 50

  51. 51

  52. Ensemble methods • Bagging • Train classifiers on subsets of data • Predict based on majority vote • Stacking • Take multiple classifiers’ outputs as inputs and train another classifier to make final prediction 52

  53. Adaboost 53

  54. Adaboost 54

  55. Good luck! 55

Recommend


More recommend