Features • Recency: Float • Words in title: String • Presence of photo: Boolean • Reading level: Integer 44
Features Reading Clicks Recency Photo Title Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 1 5.9 19 0 post-globalist perspective” 45
Features y Reading Recency Photo Title Clicks Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 1 5.9 19 0 post-globalist perspective” 46
Features x Reading Clicks Recency Photo Title Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 5.9 19 0 1 post-globalist perspective” 47
Features numeric features — defined for (nearly) every row Reading Clicks Photo Title Recency Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 5.9 19 1 0 post-globalist perspective” 48
Features boolean features — 0 or 1 (“dummy” variables) Reading Clicks Recency Title Photo Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 0 1 5.9 19 post-globalist perspective” 49
Features strings = boolean features — 0 or 1 (“dummy” variables) Reading Clicks Recency Photo Title Level 10 1.3 11 1 “New Tax Guidelines” 1000 1.7 3 1 “This 600lb baby…” “18 reasons you should never 1000000 2.4 2 1 look at this cat unless you…” “The Brothers Karamazov: a neo- 1 5.9 19 0 post-globalist perspective” 50
Features strings = boolean features — 0 or 1 (“dummy” variables) Clicks Recency Reading Title: Title: Title: Title: Photo … Level “new” “tax” “this” “…” 10 1.3 11 1 1 0 0 0 … 1000 1.7 3 1 0 0 1 1 … 1000000 2.4 2 1 0 0 1 1 … 0 0 0 0 … 1 5.9 19 0 51
Features “sparse features” — 0 for most rows Clicks Recency Reading Title: Title: Title: Title: Photo … Level “new” “tax” “this” “…” 10 1.3 11 1 1 0 0 0 … 1000 1.7 3 1 0 0 1 1 … 1000000 2.4 2 1 0 0 1 1 … 0 0 0 0 … 1 5.9 19 0 52
Clicker Question! 53 53
Clicker Question! For the problem set up, how many features will there be? I.e. how many columns in our X matrix, (not including Y)? Y: happiness X1: day of week (“monday”, “tuesday”, … “sunday”) X2: bank account balance (real value) X3: breakfast (yes,no) X4: whether you have found your inner peace (yes,no) X5: words from last week’s worth of tweets (assuming tweets are at most 15 words long and there are 100K words in the English vocabulary) (a)112,000 (c) 27 (b)5 (d)110,000 54 54
Clicker Question! For the problem set up, how many features will there be? I.e. how many columns in our X matrix, (not including Y)? Y: happiness 7 X1: day of week (“monday”, “tuesday”, … “sunday”) 1 X2: bank account balance (real value) 1 X3: breakfast (yes,no) 1 X4: whether you have found your inner peace (yes,no) 100,000 X5: words from last week’s worth of tweets (assuming tweets are at most 15 words long and there are 100K words in the English vocabulary) (a)100,012 (c) 27 (b)5 (d)100,010 55 55
Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Data = Reading Habits Features = ??? 56
Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Data = Reading Habits Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 57
Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Data = Reading Habits Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 58
Model ML = Function Approximation 59
Model ML = Function Approximation 60
Model ML = Function Approximation 61
Model ML = Function Approximation 62
Model ML = Function Approximation 63
Model ML = Function Approximation You define inputs and outputs. 64
Model ML = Function Approximation You define inputs and outputs. (The really hard part) 65
Model ML = Function Approximation The machine will (ideally) learn the function (with a lot of help from you) 66
Model ML = Function Approximation The machine will (ideally) learn the function (with a lot of help from you) (The part that gets the most attention.) 67
Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 68
Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 69
Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 70
Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 71
Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” • How to train the model? 72
Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” 2 # • How to train the model? 73
Model 1 # • Make assumptions about the problem domain. • How is the data generated? • How is the decision-making procedure structured? • What types of dependencies exist? • Trending buzzword: “inductive biases” 2 # • How to train the model? 74
Model clicks reading level 75
Model Regression: continuous (infinite) output f(reading level) = # of clicks clicks reading level 76
Model Classification: discrete (finite) output f(reading level) = {clicked, not clicked} clicks reading level 77
Model clicks = m(reading_level) + b m = -2.4 Linear Regression —> The specific “model” we are using here. clicks reading level 78
Model clicks = m(reading_level) + b m = -2.4 clicks —> output/labels/target clicks reading level 79
Model clicks = m(reading_level) + b m = -2.4 reading level —> The “feature” which is observed/derived from the data clicks reading level 80
Model clicks = m(reading_level) + b m = -2.4 m and b —> The “parameters” which need to be set (by looking at data) clicks reading level 81
Model clicks = m(reading_level) + b m = cov(rl, c)/var(rl) “setting parameters”, “learning”, “training”, “estimation” clicks reading level 82
Model clicks = m(reading_level) + b m = -2.4 parameter values, “weights”, “coefficients” clicks reading level 83
Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Task = Increase Consumption Model Linear Regression Data = Reading Habits Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 84
Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Linear Regression Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 85
Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Linear Regression Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 86
Defining an ML problem Objective/Loss Function = squared difference between predicted total number of clicks and actual total number of clicks Soooo….how do I know if my model is good? Linear Regression Features = {Recency:float, ReadingLevel:Int, Photo:Bool, Title_New:Bool, Title_Tax:Bool, …} 87
Train/Test Splits 88
Train/Test Splits MSE = 10 89
Train/Test Splits MSE = 10 90
Train/Test Splits Train 91
Train/Test Splits Train MSE = 6 92
Train/Test Splits Test 93
Clicker Question! 94
Clicker Question! What should we expect MSE to do? Test (a) Go up (b) Go down (c) Stay the same (modulo random variation) 95
Clicker Question! What should we expect MSE to do? Test If your model isn’ t “right” yet (i.e. in practice, most (a) Go up of the time) (b) Go down (c) Stay the same (modulo random variation) 96
Clicker Question! What should we expect MSE to do? Test If your model is “right” or is not yet powerful (a) Go up enough (i.e. can’ t (b) Go down memorize training data). (c) Stay the same (modulo random variation) 97
Train/Test Splits Test 98
Train/Test Splits Test MSE = 12 99
Train/Test Splits Problem gets worse as models get more powerful/flexible Train MSE = 4 100
Recommend
More recommend