Welcome and Introd u ction SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Data Scientists , Win Vector LLC
What is Regression ? Regression : Predict a n u merical o u tcome (" dependent v ariable ") from a set of inp u ts (" independent v ariables "). Statistical Sense : Predicting the e x pected v al u e of the o u tcome . Cas u al Sense : Predicting a n u merical o u tcome , rather than a discrete one . SUPERVISED LEARNING IN R : REGRESSION
What is Regression ? Ho w man y u nits w ill w e sell ? ( Regression ) Will this c u stomer b uy o u r prod u ct (y es / no )? ( Classi � cation ) What price w ill the c u stomer pa y for o u r prod u ct ? ( Regression ) SUPERVISED LEARNING IN R : REGRESSION
E x ample : Predict Temperat u re from Chirp Rate SUPERVISED LEARNING IN R : REGRESSION
Predict Temperat u re from Chirp Rate SUPERVISED LEARNING IN R : REGRESSION
Predict Temperat u re from Chirp Rate SUPERVISED LEARNING IN R : REGRESSION
Regression from a Machine Learning Perspecti v e Scienti � c mindset : Modeling to u nderstand the data generation process Engineering mindset : * Modeling to predict acc u ratel y Machine Learning : Engineering mindset SUPERVISED LEARNING IN R : REGRESSION
Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION
Linear regression - the f u ndamental method SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC
Linear Regression y = β + β x + β x + ... 0 1 1 2 2 y is linearl y related to each x i Each x contrib u tes additi v el y to y i SUPERVISED LEARNING IN R : REGRESSION
Linear Regression in R : lm () cmodel <- lm(temperature ~ chirps_per_sec, data = cricket) form u la : temperature ~ chirps_per_sec data frame : cricket SUPERVISED LEARNING IN R : REGRESSION
Form u las fmla_1 <- temperature ~ chirps_per_sec fmla_2 <- blood_pressure ~ age + weight LHS : o u tcome RHS : inp u ts u se + for m u ltiple inp u ts fmla_1 <- as.formula("temperature ~ chirps_per_sec") SUPERVISED LEARNING IN R : REGRESSION
Looking at the Model y = β + β x + β x + ... 0 1 1 2 2 cmodel Call: lm(formula = temperature ~ chirps_per_sec, data = cricket) Coefficients: (Intercept) chirps_per_sec 25.232 3.291 SUPERVISED LEARNING IN R : REGRESSION
More Information abo u t the Model summary(cmodel) Call: lm(formula = fmla, data = cricket) Residuals: Min 1Q Median 3Q Max -6.515 -1.971 0.490 2.807 5.001 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 25.2323 10.0601 2.508 0.026183 * chirps_per_sec 3.2911 0.6012 5.475 0.000107 *** Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.829 on 13 degrees of freedom Multiple R-squared: 0.6975, Adjusted R-squared: 0.6742 F-statistic: 29.97 on 1 and 13 DF, p-value: 0.0001067 SUPERVISED LEARNING IN R : REGRESSION
More Information abo u t the Model broom::glance(cmodel) sigr::wrapFTest(cmodel) SUPERVISED LEARNING IN R : REGRESSION
Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION
Predicting once y o u fit a model SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector LLC
Predicting From the Training Data cricket$prediction <- predict(cmodel) predict() b y defa u lt ret u rns training data predictions SUPERVISED LEARNING IN R : REGRESSION
Looking at the Predictions ggplot(cricket, aes(x = prediction, y = temperature)) + + geom_point() + + geom_abline(color = "darkblue") + + ggtitle("temperature vs. linear model prediction") SUPERVISED LEARNING IN R : REGRESSION
Predicting on Ne w Data newchirps <- data.frame(chirps_per_sec = 16.5) newchirps$prediction <- predict(cmodel, newdata = newchirps) newchirps chirps_per_sec pred 1 16.5 79.53537 SUPERVISED LEARNING IN R : REGRESSION
Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION
Wrapping u p linear regression SU P E R VISE D L E AR N IN G IN R : R E G R E SSION Nina Z u mel and John Mo u nt Win - Vector , LLC
Pros and Cons of Linear Regression Pros Eas y to � t and to appl y Concise Less prone to o v er � � ing SUPERVISED LEARNING IN R : REGRESSION
Pros and Cons of Linear Regression Pros Eas y to � t and to appl y Concise Less prone to o v er � � ing Interpretable Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349 SUPERVISED LEARNING IN R : REGRESSION
Pros and Cons of Linear Regression Pros Eas y to � t and to appl y Concise Less prone to o v er � � ing Interpretable Cons Can onl y e x press linear and additi v e relationships SUPERVISED LEARNING IN R : REGRESSION
Collinearit y Collinearit y -- w hen inp u t v ariables are partiall y correlated . Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349 SUPERVISED LEARNING IN R : REGRESSION
Collinearit y Collinearit y -- w hen v ariables are partiall y correlated . Coe � cients might change sign Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349 SUPERVISED LEARNING IN R : REGRESSION
Collinearit y Collinearit y -- w hen v ariables are partiall y correlated . Coe � cients might change sign High collinearit y: Coe � cients ( or standard errors ) look too large Model ma y be u nstable Call: lm(formula = blood_pressure ~ age + weight, data = bloodpressure) Coefficients: (Intercept) age weight 30.9941 0.8614 0.3349 SUPERVISED LEARNING IN R : REGRESSION
Coming Ne x t E v al u ating a regression model Properl y training a model SUPERVISED LEARNING IN R : REGRESSION
Let ' s practice ! SU P E R VISE D L E AR N IN G IN R : R E G R E SSION
Recommend
More recommend