CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: REGRESSION Spring 2019 Marion Neumann
RECAP: DATA SCIENCE …solving problems with data… scientific or collect & clean & use data data business understand format to create problem problem data data solution …which step is most exciting? Machine Learning 2
RECAP: ML …creating and using models that learn from data… • data : anything you can measure or record • model : specifica9on of a (mathema9cal) rela+onship between different variables • evalua*on : how well does the model work ? 3
RECAP: ML WORKFLOW • Training phase , test phase , and evaluation phase ground truth performance data measure data output program output à turn to your neighbor • by taking turns, explain what happens in the • training phase • test phase • evaluation phase • carefully define what kinds of data are used in each phase 4
PROPERTY SALES DATA Goal: predict how much my house is worth features (input variables) • size (in sq. ft): o numeric o categorical o binary neighborhood: o numeric o categorical o binary # bed rooms: o numeric o categorical o binary # bath rooms: o numeric o categorical o binary pool o numeric o categorical o binary age (in years): o numeric o categorical o binary How can renovated o numeric o categorical o binary this data house price = target variable • help? o numeric o categorical o binary 5
PREDICTING HOUSE PRICES • target ( house price ) is a real number How much is my house worth? Look at Zillow ! 6
LINEAR REGRESSION MODEL 7
TRAINING: MINIMIZE ERROR math & statistics PDSH p391 8 Linear Regression
PREDICTION: USE MODEL PDSH p391 9 Linear Regression
HOW ABOUT MORE COMPLEX MODELS? Error on training set : linear model >> quadratic >> 6-order polynomial ß error is zero ! Is the model with zero ( training ) error the best ? PDSH p393 10 Linear Regression
EVALUATION FOR REGRESSION • Training Error vs. Test Error & = 6(7 () ) % predictions for test data • Error measures: • RMSE: root mean squared error + 0 . − 0 . ) 3 RMSE % &, & () = , - (% . • MAE: mean absolute error &, & () = + MAE % , - |% 0 . − 0 . | 11 .
MACHINE LEARNING WORKFLOW • Training Phase, Test Phase, Evaluation Phase 12
SUMMARY & READING • Learning from Data requires a lot of math ! • Regression models are used to predict real valued targets . • We need a test set to evaluate how well our model generalizes . understand the model • DSFS use the model in • Ch11: ML (p142-144) practice • Ch14: Simple Linear Regression (p173-176) • PDSH Ch5: ML – Linear Regression (p390-394) SciKit Learn • LINEAR REGRESSION BY HAND https://www.wired.com/2011/01/linear-regression-by-hand/ 13
Recommend
More recommend