department of computer science csci 5622 machine learning
play

Department of Computer Science CSCI 5622: Machine Learning Chenhao - PowerPoint PPT Presentation

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12: Regularization, regression, and multi-class classification Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1 HW 2 2 Learning objective Review


  1. Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12: Regularization, regression, and multi-class classification Slides adapted from Jordan Boyd-Graber, Chris Ketelsen 1

  2. HW 2 2

  3. Learning objective • Review homeworks and multi-class classification • Linear regression • Examine regularization in the regression context • Recognize the effects of regularization on bias/variance 3

  4. Outline • Multi-class classification • Linear regression • Regularization 4

  5. Outline • Multi-class classification • Linear regression • Regularization 5

  6. Multi-class classification • Binary examples • Spam classification • Sentiment classification 6

  7. Multi-class classification • Binary examples • Spam classification • Sentiment classification • Multi-class examples • Star-ratings classification • Part-of-speech tagging • Image classification 7

  8. What we learned so far • KNN • Naïve Bayes • Logistic regression • Neural networks • Support vector machines 8

  9. Binary vs. Multi-class classification 9

  10. Multi-class logistic regression 10

  11. 11

  12. Multi-class Support Vector Machines • Reduction • One-against-all • All-pairs • Modify objective function (SSBD 17.2) 12

  13. Reduction 13

  14. How do we use binary classifier to output categorical labels? 14

  15. One-against-all 15

  16. One-against-all • Break k- class problem into k binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest confidence 16

  17. One-against-all 17

  18. All-pairs 18

  19. All-pairs • Break k- class problem into k(k-1)/2 binary problems and solve separately • Combine predictions: evaluates all h’s, take the one with highest sum confidence 19

  20. All-pairs 20

  21. Outline • Multi-class classification • Linear regression • Regularization 21

  22. Linear regression • Data are continuous inputs and outputs 22

  23. Linear regression example • Given a person’s age and gender, predict their height • Given the square footage and number of bathrooms in a house, predict its sale price • Given unemployment, inflation, number of wars, and economics growth, predict the president’s approval rating • Given a user’s browsing history, predict how long he will stay on product page • Given the advertising budget expenditures in various markets, predict the number of products sold 23

  24. Linear regression example 24

  25. Linear regression example 25

  26. Derived features 26

  27. Derived features 27

  28. Objective function The objective function is called the residual sum of squares : 28

  29. Probabilistic interpretation A discriminative model that assumes the response Gaussian with mean 29

  30. Probabilistic interpretation A discriminative model that assumes the response Gaussian with mean 30

  31. Probabilistic interpretation Assuming i.i.d. samples, we can write the likelihood of the data as 31

  32. Probabilistic interpretation Negative log likelihood 32

  33. Probabilistic interpretation Negative log likelihood 33

  34. Revisiting Bias-variance Tradeoff • Consider the case of fitting linear regression with derived polynomial features to a set of training data • In general, want a model that explains the training data and can still generalize to unseen test data 34

  35. Revisiting Bias-variance Tradeoff 35

  36. Outline • Multi-class classification • Linear regression • Regularization 36

  37. High variance • Model wiggles wildly to get close to data • To get big swings, model coefficients are very large • weight go to 10 6 37

  38. Regularization • Keep all the features, but force the coefficients to be smaller • This is called regularization 38

  39. Regularization • Add penalty term to RSS objective function • Balance between small RSS and small coefficients 39

  40. Regularization • Add penalty term to RSS objective function • Balance between small RSS and small coefficients • HW 2 extra credit question 40

  41. 41

  42. Regularization 42

  43. Ridge regularization 43

  44. Ridge regularization 44

  45. Bias-variance tradeoff The penalty decreases, then A. the bias increases, the variance increases B. the bias increases, the variance decreases C. the bias decreases, the variance increases D. the bias decreases, the variance decreases 45

  46. Ridge regularization vs. lasso regularization • How do the coefficients behave as increases? λ 46

  47. Ridge regularization • Coefficients shrink to zero uniformly smoothly 47

  48. Lasso regularization • Some coefficients shrink to zero very fast 48

  49. Ridge regularization vs. lasso regularization • Why does the choice between the two types of regularization lead to very different behavior? • Several ways to look at it • Constrained minimization • Look at a simplified case of data • Prior probabilities on parameters 49

  50. Intuition 1: Constrained Minimization 50

  51. Intuition 1: Constrained Minimization 51

  52. Intuition 1: Constrained Minimization Minimum more likely to be at point of diamond with Lasso, causing some feature weights to be set to zero. 52

  53. Intuition 2: A Simplified Case 53

  54. Intuition 2: A Simplified Case 54

  55. Intuition 2: A Simplified Case 55

  56. Intuition 2: A Simplified Case 56

  57. Intuition 3: Prior Distribution 57

  58. Intuition 3: Prior Distribution 58

  59. Intuition 3: Prior Distribution 59

  60. Intuition 3: Prior Distribution 60

  61. Intuition 3: Prior Distribution • Lasso's prior peaked at 0 means expect many params to be zero • Ridge's prior flatter and fatter around 0 means we expect many coefficients to be smallish 61

  62. Wrap up • Regularization and the idea behind it is crucial for machine learning • Always use regularization in some form • Next • Ensemble methods 62

Recommend


More recommend