intro to ml
play

Intro to ML March 10, 2020 Data Science CSCI 1951A Brown - PowerPoint PPT Presentation

Intro to ML March 10, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick HTAs: Josh Levin, Diane Mutako, Sol Zitter 1 Announcements This class is going viral! (Funny? No? Too soon?) Not officially, but starting to


  1. Features • Recency: Float • Words in title: String • Presence of photo: Boolean • Reading level: Integer 44

  2. Features Reading Clicks Recency Photo Title Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 1 5.9 19 0 post-globalist perspective” 45

  3. Features y Reading Recency Photo Title Clicks Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 1 5.9 19 0 post-globalist perspective” 46

  4. Features x Reading Clicks Recency Photo Title Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 5.9 19 0 1 post-globalist perspective” 47

  5. Features numeric features — defined for (nearly) every row Reading Clicks Photo Title Recency Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 5.9 19 1 0 post-globalist perspective” 48

  6. Features boolean features — 0 or 1 (“dummy” variables) Reading Clicks Recency Title Photo Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 0 1 5.9 19 post-globalist perspective” 49

  7. Features strings = boolean features — 0 or 1 (“dummy” variables) Reading Clicks Recency Photo Title Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 1 5.9 19 0 post-globalist perspective” 50

  8. Features strings = boolean features — 0 or 1 (“dummy” variables) Clicks Recency Reading Title: Title: Title: Title: Photo … Level “new” “tax” “this” “…” 10 1.3 11 1 1 0 0 0 … 1000 1.7 3 1 0 0 1 1 … 1000000 2.4 2 1 0 0 1 1 … 0 0 0 0 … 1 5.9 19 0 51

  9. Features “sparse features” — 0 for most rows Clicks Recency Reading Title: Title: Title: Title: Photo … Level “new” “tax” “this” “…” 10 1.3 11 1 1 0 0 0 … 1000 1.7 3 1 0 0 1 1 … 1000000 2.4 2 1 0 0 1 1 … 0 0 0 0 … 1 5.9 19 0 52

  10. Clicker Question! 53 53

  11. Clicker Question! For the problem set up, how many features will there be? I.e. how many columns in our X matrix, (not including Y)? Y: happiness X1: day of week (“monday”, “tuesday”, … “sunday”) X2: bank account balance (real value) X3: breakfast (yes,no) X4: whether you have found your inner peace (yes,no) X5: words from last week’s worth of tweets (assuming tweets are at most 15 words long and there are 100K words in the English vocabulary) (a)112,000 (c) 27 (b)5 (d)110,000 54 54

  12. Clicker Question! For the problem set up, how many features will there be? I.e. how many columns in our X matrix, (not including Y)? Y: happiness 7 X1: day of week (“monday”, “tuesday”, … “sunday”) 1 X2: bank account balance (real value) 1 X3: breakfast (yes,no) 1 X4: whether you have found your inner peace (yes,no) 100,000 X5: words from last week’s worth of tweets (assuming tweets are at most 15 words long and there are 100K words in the English vocabulary) (a)100,012 (c) 27 (b)5 (d)100,010 55 55

  13. Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Data = Reading Habits Features = ??? 56

  14. Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Data = Reading Habits Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 57

  15. Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Data = Reading Habits Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 58

  16. Model ML = Function Approximation 59

  17. Model ML = Function Approximation 60

  18. Model ML = Function Approximation 61

  19. Model ML = Function Approximation 62

  20. Model ML = Function Approximation 63

  21. Model ML = Function Approximation You define inputs and outputs. 64

  22. Model ML = Function Approximation You define inputs and outputs. (The really hard part) 65

  23. Model ML = Function Approximation The machine will (ideally) learn the function (with a lot of help from you) 66

  24. Model ML = Function Approximation The machine will (ideally) learn the function (with a lot of help from you) (The part that gets the most attention.) 67

  25. Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 68

  26. Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 69

  27. Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 70

  28. Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 71

  29. Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 72

  30. Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” 2 # • How to train the model? 73

  31. Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” 2 # • How to train the model? 74

  32. Model clicks reading level 75

  33. Model Regression: continuous (infinite) output f(reading level) = # of clicks clicks reading level 76

  34. Model Classification: discrete (finite) output f(reading level) = {clicked, not clicked} clicks reading level 77

  35. Model clicks = m(reading_level) + b m = -2.4 Linear Regression —> The specific “model” we are using here. clicks reading level 78

  36. Model clicks = m(reading_level) + b m = -2.4 clicks —> output/labels/target clicks reading level 79

  37. Model clicks = m(reading_level) + b m = -2.4 reading level —> The “feature” which is observed/derived from the data clicks reading level 80

  38. Model clicks = m(reading_level) + b m = -2.4 m and b —> The “parameters” which need to be set (by looking at data) clicks reading level 81

  39. Model clicks = m(reading_level) + b m = cov(rl, c)/var(rl) “setting parameters”, “learning”, “training”, “estimation” clicks reading level 82

  40. Model clicks = m(reading_level) + b m = -2.4 parameter values, “weights”, “coefficients” clicks reading level 83

  41. Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Linear Regression Data = Reading Habits Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 84

  42. Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Linear Regression Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 85

  43. Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Linear Regression Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 86

  44. Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Soooo….how do I know if my model is good? Linear Regression Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 87

  45. Train/Test Splits 88

  46. Train/Test Splits MSE = 10 89

  47. Train/Test Splits MSE = 10 90

  48. Train/Test Splits Train 91

  49. Train/Test Splits Train MSE = 6 92

  50. Train/Test Splits Test 93

  51. Clicker Question! 94

  52. Clicker Question! What should we expect MSE to do? Test (a) Go up (b) Go down (c) Stay the same (modulo random variation) 95

  53. Clicker Question! What should we expect MSE to do? Test If your model isn’ t “right” yet (i.e. in practice, most (a) Go up of the time) (b) Go down (c) Stay the same (modulo random variation) 96

  54. Clicker Question! What should we expect MSE to do? Test If your model is “right” or is not yet powerful (a) Go up enough (i.e. can’ t (b) Go down memorize training data). (c) Stay the same (modulo random variation) 97

  55. Train/Test Splits Test 98

  56. Train/Test Splits Test MSE = 12 99

  57. Train/Test Splits Problem gets worse as models get more powerful/flexible Train MSE = 4 100

Recommend


More recommend