Political Science 209 - Fall 2018 Linear Regression Florian Hollenbach 12th October 2018
Recall Correlation & Scatterplot Income and Child Mortality ● 200 ● Correlation = − 0.77 ● ● 150 ● ● ● Child Mortality ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 7 8 9 10 11 logged GDP in PPP What is the correlation? Florian Hollenbach 1
Recall the definition of correlation � N Correlation (x,y) = 1 i = 1 z-score of x i × z-score of y i N � N sd x × y i − ¯ y Correlation (x,y) = 1 x i − ¯ x i = 1 N sd y Florian Hollenbach 2
Correlations & Scatterplots/Data points 1. positive correlation � upward slope 2. negative correlation � downward slope 3. high correlation � tighter, close to a line 4. correlation cannot capture nonlinear relationship Florian Hollenbach 3
Correlations & Scatterplots/Data points (a) correlation = 0.22 (b) correlation = 0.88 3 3 ● ● ● 2 ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● −2 ● ● ● ● ● ● −3 −3 ● ● −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 (c) correlation = −0.7 (d) correlation = 0.02 3 3 ● ● ● ● ● ● ● ● ● ● 2 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● ● −2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −3 ● ● −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 Florian Hollenbach 4
Moving from Correlation to Linear Regression Preview: • linear regression allows us to create predictions • linear regression specifies direction of relationship • linear regression allows us to examine more than two variables at the same time ( statistical control ) Florian Hollenbach 5
Linear Regression • regression has one dependent (y) and for now one independent (x) variable • regression is a statistical method to estimate the linear relationship between variables Florian Hollenbach 6
Linear Regression • goal of regression is to approximate the (linear) relationship between X and Y as best as possible Florian Hollenbach 7
Linear Regression • goal of regression is to approximate the (linear) relationship between X and Y as best as possible • regression is the mathematical model to draw best fitting line through cloud of points Florian Hollenbach 7
Linear Regression Income and Child Mortality ● 200 ● ● ● 150 ● ● ● Child Mortality ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 7 8 9 10 11 logged GDP in PPP Linear regression is the mathematical model to draw best fitting line through cloud of points Florian Hollenbach 8
Linear Regression Income and Child Mortality ● 200 ● ● ● 150 ● ● ● Child Mortality ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 7 8 9 10 11 logged GDP in PPP • regression line is an estimate of the (for now bivariate) relationship between x and y • for each x we have a prediction of y: what would we expect y to be given the value of x? Florian Hollenbach 9
What is the equation of a line? Equation of a line? Florian Hollenbach 10
What is the equation of a line? Equation of a line? y = mx + b → b? m? Florian Hollenbach 10
What is the equation of a line? Equation of a line? y = mx + b b → y-intercept m → slope Florian Hollenbach 11
What is the equation of a line? Equation of a line? y = mx + b b → y-intercept m → slope regression equation: Y = α + β X + ǫ → α ? β ? ǫ ? Florian Hollenbach 11
What is the equation of a line? Equation of a line? y = mx + b b → y-intercept m → slope regression equation: Y = alpha + β X + ǫ α → y-intercept β → slope ǫ → error Florian Hollenbach 12
Regression equation Income and Child Mortality ● 200 ● y−intercept = 282.46 ● ● 150 Slope = −26.61 ● ● ● Child Mortality ● ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 7 8 9 10 11 logged GDP in PPP Florian Hollenbach 13
Regression equation Income and Child Mortality ● 200 ● y−intercept = 282.46 ● ● 150 ● Slope = −26.61 ● ● Child Mortality ● ● ● ● ● ● ● ● 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 7 8 9 10 11 logged GDP in PPP Y = 282 . 46 + − 26 . 61 X + ǫ Florian Hollenbach 14
Recommend
More recommend