regression
play

Regression Many slides attributable to: Prof. Mike Hughes Erik - PowerPoint PPT Presentation

Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL


  1. Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) 1

  2. Logistics • HW0 due TONIGHT (Wed 1/23 at 11:59pm) • HW1 out later tonight, due a week from today • What you submit: PDF and zip • Next recitation is Mon 1/28 • Multivariate Calculus review • The gory math behind linear regression Mike Hughes - Tufts COMP 135 - Spring 2019 2

  3. Regression Unit Objectives • 3 steps of a regression task • Training • Prediction • Evaluation • Metrics • Splitting data into train/valid/test • A “taste” of 3 Methods • Linear Regression • K-Nearest Neighbors • Decision Tree Regression Mike Hughes - Tufts COMP 135 - Spring 2019 3

  4. What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Spring 2019 4

  5. Task: Regression y is a numeric variable Supervised e.g. sales in $$ Learning regression y Unsupervised Learning Reinforcement Learning x Mike Hughes - Tufts COMP 135 - Spring 2019 5

  6. Regression Example: Uber Supervised Learning regression Unsupervised Learning Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 6

  7. Regression Example: Uber Mike Hughes - Tufts COMP 135 - Spring 2019 7

  8. Regression Example: Uber Mike Hughes - Tufts COMP 135 - Spring 2019 8

  9. Try it! What should happen here? What info did you use to make that guess? Mike Hughes - Tufts COMP 135 - Spring 2019 9

  10. Regression: Prediction Step Goal: Predict response y well given features x x i , [ x i 1 , x i 2 , . . . x if . . . x iF ] • Input: “features” Entries can be real-valued, or other numeric types (e.g. integer, binary) “covariates” “predictors” “attributes” y ( x i ) ∈ R ˆ • Output: Scalar value like 3.1 or -133.7 “responses” “labels” Mike Hughes - Tufts COMP 135 - Spring 2019 10

  11. Regression: Prediction Step >>> # Given: pretrained regression object model >>> # Given: 2D array of features x >>> x_NF.shape (N, F) >>> yhat_N = model.predict(x_NF) >>> yhat_N.shape (N,) Mike Hughes - Tufts COMP 135 - Spring 2019 11

  12. Regression: Training Step Goal: Given a labeled dataset, learn a function that can perform prediction well • Input: Pairs of features and labels/responses { x n , y n } N n =1 y ( · ) : R F → R ˆ • Output: Mike Hughes - Tufts COMP 135 - Spring 2019 12

  13. Regression: Training Step >>> # Given: 2D array of features x >>> # Given: 1D array of responses/labels y >>> y_N.shape (N,) >>> x_NF.shape (N, F) >>> model = RegressionModel() >>> model.fit(x_NF, y_N) Mike Hughes - Tufts COMP 135 - Spring 2019 13

  14. Regression: Evaluation Step Goal: Assess quality of predictions • Input: Pairs of predicted and “true” responses y ( x n ) , y n } N { ˆ n =1 • Output: Scalar measure of error/quality • Measuring Error: lower is better • Measuring Quality: higher is better Mike Hughes - Tufts COMP 135 - Spring 2019 14

  15. Visualizing errors Mike Hughes - Tufts COMP 135 - Spring 2019 15

  16. Regression: Evaluation Metrics N 1 • mean squared error X y n ) 2 ( y n − ˆ N n =1 N • mean absolute error 1 X | y n − ˆ y n | N n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 16

  17. Discuss • Which error metric is more sensitive to outliers? • Which error metric is the easiest to take derivatives of? Mike Hughes - Tufts COMP 135 - Spring 2019 17

  18. Regression: Evaluation Metrics https://scikit-learn.org/stable/modules/model_evaluation.html Mike Hughes - Tufts COMP 135 - Spring 2019 18

  19. How to model y given x ? Mike Hughes - Tufts COMP 135 - Spring 2019 19

  20. Is the model constant? Mike Hughes - Tufts COMP 135 - Spring 2019 20

  21. Is the model linear? Mike Hughes - Tufts COMP 135 - Spring 2019 21

  22. Is the model polynomial? Mike Hughes - Tufts COMP 135 - Spring 2019 22

  23. Generalize: sample to population Mike Hughes - Tufts COMP 135 - Spring 2019 23

  24. Generalize: sample to population Mike Hughes - Tufts COMP 135 - Spring 2019 24

  25. Labeled dataset y x Each row represents one example Assume rows are arranged “uniformly at random” (order doesn’t matter) Mike Hughes - Tufts COMP 135 - Spring 2019 25

  26. Split into train and test y x train test Mike Hughes - Tufts COMP 135 - Spring 2019 26

  27. Model Complexity vs Error Overfitting Underfitting Mike Hughes - Tufts COMP 135 - Spring 2019 27

  28. How to fit best model? Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error y x train test Mike Hughes - Tufts COMP 135 - Spring 2019 28

  29. How to fit best model? Avoid! Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error y x Problems train Fitting procedure used test data • Not fair assessment of how will do on • unseen data test Mike Hughes - Tufts COMP 135 - Spring 2019 29

  30. How to fit best model? Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set y x train validation test Mike Hughes - Tufts COMP 135 - Spring 2019 30

  31. How to fit best model? Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set y x train Concerns • Will train be too small? Make better use of data? • validation test Mike Hughes - Tufts COMP 135 - Spring 2019 31

  32. Linear Regression Parameters: w = [ w 1 , w 2 , . . . w f . . . w F ] weight vector b bias scalar Prediction: F X y ( x i ) , ˆ w f x if + b f =1 Training: find weights and bias that minimize error Mike Hughes - Tufts COMP 135 - Spring 2019 32

  33. Sales vs. Ad Budgets Mike Hughes - Tufts COMP 135 - Spring 2019 33

  34. Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 34

  35. Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist! x = mean( x 1 , . . . x N ) ¯ With only one feature (F=1): y = mean( y 1 , . . . y N ) ¯ P N n =1 ( x n − ¯ x )( y n − ¯ y ) b = ¯ y − w ¯ x w = P N n =1 ( x n − ¯ x ) 2 We will derive these in next class Mike Hughes - Tufts COMP 135 - Spring 2019 35

  36. Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist!   x 11 . . . x 1 F 1 x 21 . . . x 2 F 1 ˜   X = With many features (F >= 1 ):   . . .   x N 1 . . . x NF 1 [ w 1 . . . w F b ] T = ( ˜ X T ˜ X ) − 1 ˜ X T y We will derive these in next class Mike Hughes - Tufts COMP 135 - Spring 2019 36

  37. Nearest Neighbor Regression Parameters: none Prediction: - find “nearest” training vector to given input x - predict y value of this neighbor Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Spring 2019 37

  38. Distance metrics v F u • Euclidean X u dist( x, x 0 ) = ( x f − x 0 f ) 2 t f =1 F X • Manhattan dist( x, x 0 ) = | x f − x 0 f | f =1 • Many others are possible Mike Hughes - Tufts COMP 135 - Spring 2019 38

  39. Nearest Neighbor “Prediction functions” are piecewise constant Mike Hughes - Tufts COMP 135 - Spring 2019 39

  40. K nearest neighbor regression Parameters: K : number of neighbors Prediction: - find K “nearest” training vectors to input x - predict average y of this neighborhood Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Spring 2019 40

  41. Error vs Model Complexity Credit: Fig 2.4 ESL textbook Mike Hughes - Tufts COMP 135 - Spring 2019 41

  42. Salary prediction for Hitters data Mike Hughes - Tufts COMP 135 - Spring 2019 42

  43. Mike Hughes - Tufts COMP 135 - Spring 2019 43

  44. Decision Tree Regression Mike Hughes - Tufts COMP 135 - Spring 2019 44

  45. Decision tree regression Parameters: - at each internal node: x variable id and threshold - at each leaf: scalar y value to predict Prediction assumption: - x space is divided into rectangular regions - y is similar within “region” Training assumption: - minimize error on training set - often, use greedy heuristics Mike Hughes - Tufts COMP 135 - Spring 2019 45

  46. Ideal Training for Decision Tree J X X y R j ) 2 min ( y n − ˆ R 1 ,...R J j =1 n : x n ∈ R j Search space is too big! Hard to solve exactly… Mike Hughes - Tufts COMP 135 - Spring 2019 46

Recommend


More recommend