Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard) James, Witten, Hastie, Tibshirani (ISL/ESL books) 1
Logistics • HW0 due TONIGHT (Wed 1/23 at 11:59pm) • HW1 out later tonight, due a week from today • What you submit: PDF and zip • Next recitation is Mon 1/28 • Multivariate Calculus review • The gory math behind linear regression Mike Hughes - Tufts COMP 135 - Spring 2019 2
Regression Unit Objectives • 3 steps of a regression task • Training • Prediction • Evaluation • Metrics • Splitting data into train/valid/test • A “taste” of 3 Methods • Linear Regression • K-Nearest Neighbors • Decision Tree Regression Mike Hughes - Tufts COMP 135 - Spring 2019 3
What will we learn? Evaluation Supervised Training Learning Data, Label Pairs Performance { x n , y n } N measure Task n =1 Unsupervised Learning data label x y Reinforcement Learning Prediction Mike Hughes - Tufts COMP 135 - Spring 2019 4
Task: Regression y is a numeric variable Supervised e.g. sales in $$ Learning regression y Unsupervised Learning Reinforcement Learning x Mike Hughes - Tufts COMP 135 - Spring 2019 5
Regression Example: Uber Supervised Learning regression Unsupervised Learning Reinforcement Learning Mike Hughes - Tufts COMP 135 - Spring 2019 6
Regression Example: Uber Mike Hughes - Tufts COMP 135 - Spring 2019 7
Regression Example: Uber Mike Hughes - Tufts COMP 135 - Spring 2019 8
Try it! What should happen here? What info did you use to make that guess? Mike Hughes - Tufts COMP 135 - Spring 2019 9
Regression: Prediction Step Goal: Predict response y well given features x x i , [ x i 1 , x i 2 , . . . x if . . . x iF ] • Input: “features” Entries can be real-valued, or other numeric types (e.g. integer, binary) “covariates” “predictors” “attributes” y ( x i ) ∈ R ˆ • Output: Scalar value like 3.1 or -133.7 “responses” “labels” Mike Hughes - Tufts COMP 135 - Spring 2019 10
Regression: Prediction Step >>> # Given: pretrained regression object model >>> # Given: 2D array of features x >>> x_NF.shape (N, F) >>> yhat_N = model.predict(x_NF) >>> yhat_N.shape (N,) Mike Hughes - Tufts COMP 135 - Spring 2019 11
Regression: Training Step Goal: Given a labeled dataset, learn a function that can perform prediction well • Input: Pairs of features and labels/responses { x n , y n } N n =1 y ( · ) : R F → R ˆ • Output: Mike Hughes - Tufts COMP 135 - Spring 2019 12
Regression: Training Step >>> # Given: 2D array of features x >>> # Given: 1D array of responses/labels y >>> y_N.shape (N,) >>> x_NF.shape (N, F) >>> model = RegressionModel() >>> model.fit(x_NF, y_N) Mike Hughes - Tufts COMP 135 - Spring 2019 13
Regression: Evaluation Step Goal: Assess quality of predictions • Input: Pairs of predicted and “true” responses y ( x n ) , y n } N { ˆ n =1 • Output: Scalar measure of error/quality • Measuring Error: lower is better • Measuring Quality: higher is better Mike Hughes - Tufts COMP 135 - Spring 2019 14
Visualizing errors Mike Hughes - Tufts COMP 135 - Spring 2019 15
Regression: Evaluation Metrics N 1 • mean squared error X y n ) 2 ( y n − ˆ N n =1 N • mean absolute error 1 X | y n − ˆ y n | N n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 16
Discuss • Which error metric is more sensitive to outliers? • Which error metric is the easiest to take derivatives of? Mike Hughes - Tufts COMP 135 - Spring 2019 17
Regression: Evaluation Metrics https://scikit-learn.org/stable/modules/model_evaluation.html Mike Hughes - Tufts COMP 135 - Spring 2019 18
How to model y given x ? Mike Hughes - Tufts COMP 135 - Spring 2019 19
Is the model constant? Mike Hughes - Tufts COMP 135 - Spring 2019 20
Is the model linear? Mike Hughes - Tufts COMP 135 - Spring 2019 21
Is the model polynomial? Mike Hughes - Tufts COMP 135 - Spring 2019 22
Generalize: sample to population Mike Hughes - Tufts COMP 135 - Spring 2019 23
Generalize: sample to population Mike Hughes - Tufts COMP 135 - Spring 2019 24
Labeled dataset y x Each row represents one example Assume rows are arranged “uniformly at random” (order doesn’t matter) Mike Hughes - Tufts COMP 135 - Spring 2019 25
Split into train and test y x train test Mike Hughes - Tufts COMP 135 - Spring 2019 26
Model Complexity vs Error Overfitting Underfitting Mike Hughes - Tufts COMP 135 - Spring 2019 27
How to fit best model? Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error y x train test Mike Hughes - Tufts COMP 135 - Spring 2019 28
How to fit best model? Avoid! Option 1: Fit on train, select on test 1) Fit each model to training data 2) Evaluate each model on test data 3) Select model with lowest test error y x Problems train Fitting procedure used test data • Not fair assessment of how will do on • unseen data test Mike Hughes - Tufts COMP 135 - Spring 2019 29
How to fit best model? Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set y x train validation test Mike Hughes - Tufts COMP 135 - Spring 2019 30
How to fit best model? Option: Fit on train, select on validation 1) Fit each model to training data 2) Evaluate each model on validation data 3) Select model with lowest validation error 4)Report error on test set y x train Concerns • Will train be too small? Make better use of data? • validation test Mike Hughes - Tufts COMP 135 - Spring 2019 31
Linear Regression Parameters: w = [ w 1 , w 2 , . . . w f . . . w F ] weight vector b bias scalar Prediction: F X y ( x i ) , ˆ w f x if + b f =1 Training: find weights and bias that minimize error Mike Hughes - Tufts COMP 135 - Spring 2019 32
Sales vs. Ad Budgets Mike Hughes - Tufts COMP 135 - Spring 2019 33
Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Mike Hughes - Tufts COMP 135 - Spring 2019 34
Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist! x = mean( x 1 , . . . x N ) ¯ With only one feature (F=1): y = mean( y 1 , . . . y N ) ¯ P N n =1 ( x n − ¯ x )( y n − ¯ y ) b = ¯ y − w ¯ x w = P N n =1 ( x n − ¯ x ) 2 We will derive these in next class Mike Hughes - Tufts COMP 135 - Spring 2019 35
Linear Regression: Training Optimization problem: “Least Squares” N ⌘ 2 ⇣ X min y n − ˆ y ( x n , w, b ) w,b n =1 Exact formula for optimal values of w, b exist! x 11 . . . x 1 F 1 x 21 . . . x 2 F 1 ˜ X = With many features (F >= 1 ): . . . x N 1 . . . x NF 1 [ w 1 . . . w F b ] T = ( ˜ X T ˜ X ) − 1 ˜ X T y We will derive these in next class Mike Hughes - Tufts COMP 135 - Spring 2019 36
Nearest Neighbor Regression Parameters: none Prediction: - find “nearest” training vector to given input x - predict y value of this neighbor Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Spring 2019 37
Distance metrics v F u • Euclidean X u dist( x, x 0 ) = ( x f − x 0 f ) 2 t f =1 F X • Manhattan dist( x, x 0 ) = | x f − x 0 f | f =1 • Many others are possible Mike Hughes - Tufts COMP 135 - Spring 2019 38
Nearest Neighbor “Prediction functions” are piecewise constant Mike Hughes - Tufts COMP 135 - Spring 2019 39
K nearest neighbor regression Parameters: K : number of neighbors Prediction: - find K “nearest” training vectors to input x - predict average y of this neighborhood Training: none needed (use training data as lookup table) Mike Hughes - Tufts COMP 135 - Spring 2019 40
Error vs Model Complexity Credit: Fig 2.4 ESL textbook Mike Hughes - Tufts COMP 135 - Spring 2019 41
Salary prediction for Hitters data Mike Hughes - Tufts COMP 135 - Spring 2019 42
Mike Hughes - Tufts COMP 135 - Spring 2019 43
Decision Tree Regression Mike Hughes - Tufts COMP 135 - Spring 2019 44
Decision tree regression Parameters: - at each internal node: x variable id and threshold - at each leaf: scalar y value to predict Prediction assumption: - x space is divided into rectangular regions - y is similar within “region” Training assumption: - minimize error on training set - often, use greedy heuristics Mike Hughes - Tufts COMP 135 - Spring 2019 45
Ideal Training for Decision Tree J X X y R j ) 2 min ( y n − ˆ R 1 ,...R J j =1 n : x n ∈ R j Search space is too big! Hard to solve exactly… Mike Hughes - Tufts COMP 135 - Spring 2019 46
Recommend
More recommend