Correlation Regression Getting to Regression: The Workhorse of Quantitative Political Analysis Department of Government London School of Economics and Political Science
Correlation Regression 1 Correlation 2 Regression
Correlation Regression 1 Correlation 2 Regression
Correlation Regression Correlation as Measure of Bivariate Relationship Covariance: ( X i − ¯ X )( Y i − ¯ Y ) � n Cov ( X , Y ) = i =1 n − 1
Correlation Regression Correlation as Measure of Bivariate Relationship Covariance: ( X i − ¯ X )( Y i − ¯ Y ) � n Cov ( X , Y ) = i =1 n − 1 Correlation: ( X i − ¯ X )( Y i − ¯ Y ) � n Corr ( X , Y ) = r x , y = i =1 ( n − 1) s x s y �� n where s x = i =1 ( x i − ¯ x ) 2
Correlation Regression Correlation is linear! Source: Wikimedia
Correlation Regression Guess the Correlation! 1 Go to: http://guessthecorrelation.com/ 2 Play a few rounds
Correlation Regression 1 Correlation 2 Regression
Correlation Regression Regression Definition: a statistical method for measuring the relationships between one variable and many other variables
Correlation Regression Regression Definition: a statistical method for measuring the relationships between one variable and many other variables Uses of Regression 1 Description 2 Prediction 3 Causal Inference
Correlation Regression Regression Definition: a statistical method for measuring the relationships between one variable and many other variables Uses of Regression 1 Description 2 Prediction 3 Causal Inference Ordinary least squares (OLS) regression
Correlation Regression Interpretations of OLS
Correlation Regression Interpretations of OLS 1 Line (or surface) of best fit 2 Ratio of Cov ( X , Y ) and Var ( X ) 3 Minimizing residual sum of squares (SSR)
Correlation Regression Interpretations of OLS 1 Line (or surface) of best fit 2 Ratio of Cov ( X , Y ) and Var ( X ) 3 Minimizing residual sum of squares (SSR) 4 Estimating unit-level causal effect
Correlation Regression Bivariate Regression I Y is continuous X is a randomized treatment indicator/dummy (0 , 1) How do we know if the X had an effect on Y ?
Correlation Regression Bivariate Regression I Y is continuous X is a randomized treatment indicator/dummy (0 , 1) How do we know if the X had an effect on Y ? Look at outcome mean-difference: E [ Y | X = 1] − E [ Y | X = 0]
Correlation Regression Bivariate Regression I Mean difference ( E [ Y | X = 1] − E [ Y | X = 0]) is the regression line slope Slope ( β ) defined as ∆ Y ∆ X
Correlation Regression Bivariate Regression I Mean difference ( E [ Y | X = 1] − E [ Y | X = 0]) is the regression line slope Slope ( β ) defined as ∆ Y ∆ X ∆ Y = E [ Y | X = 1] − E [ Y | X = 0] ∆ X = 1 − 0 = 1
Correlation Regression Three Equations 1 Population: Y = β 0 + β 1 X (+ ǫ )
Correlation Regression Three Equations 1 Population: Y = β 0 + β 1 X (+ ǫ ) 2 Sample estimate: y = ˆ β 0 + ˆ ˆ β 1 x + e
Correlation Regression Three Equations 1 Population: Y = β 0 + β 1 X (+ ǫ ) 2 Sample estimate: y = ˆ β 0 + ˆ ˆ β 1 x + e 3 Unit: y i = ˆ β 0 + ˆ β 1 x i + e i = ¯ y 0 i + ( y 1 i − y 0 i ) x i + ( y 0 i − ¯ y 0 i )
Correlation Regression y 7 6 5 4 3 2 1 x 0 1
Correlation Regression y 7 6 ¯ y 1 5 4 3 ¯ y 0 2 1 x 0 1
Correlation Regression y 7 6 ¯ y 1 5 4 ∆ y 3 ¯ y 0 2 ∆ x 1 x 0 1
Correlation Regression y 7 6 ¯ y 1 5 4 ∆ y = β 1 3 ¯ y 0 2 ˆ β 0 ∆ x 1 x 0 1
Correlation Regression y y = ˆ β 0 + ˆ 7 ˆ β 1 x 6 5 4 ˆ β 1 3 2 ˆ β 0 1 x 0 1
Correlation Regression y 7 ˆ y = 2 + 3 x 6 5 4 3 2 1 x 0 1
Correlation Regression y 7 y = 2 + 3 x ˆ 6 y i = 2 + 3 x i + e i e i 5 4 3 2 1 x 0 1
Correlation Regression Questions?
Correlation Regression Continuous X If x is continuous, calculation is more complicated Rather than β 1 being the mean-difference in outcomes, it is the slope across all values of x ˆ β 1 = Cov ( x , y ) / Var ( x )
Correlation Regression Calculations x ) 2 x i y i x i − ¯ x y i − ¯ y ( x i − ¯ x )( y i − ¯ y ) ( x i − ¯ 1 1 ? ? ? ? 2 5 ? ? ? ? 3 3 ? ? ? ? 4 6 ? ? ? ? 5 2 ? ? ? ? 6 7 ? ? ? ? x ¯ y ¯ Cov ( x , y ) Var ( x )
Correlation Regression y 7 6 5 4 3 2 1 x 0 1 2 3 4 5 6 7
Correlation Regression ¯ x y 7 6 5 ¯ y 4 3 2 1 x 0 1 2 3 4 5 6 7
Correlation Regression ¯ x y 7 6 5 ¯ y 4 3 2 1 x 0 1 2 3 4 5 6 7
Correlation Regression ¯ x y 7 6 5 ¯ y 4 3 2 1 x 0 1 2 3 4 5 6 7
Correlation Regression Calculations x ) 2 x i y i x i − ¯ x y i − ¯ y ( x i − ¯ x )( y i − ¯ y ) ( x i − ¯ 1 1 ? ? ? ? 2 5 ? ? ? ? 3 3 ? ? ? ? 4 6 ? ? ? ? 5 2 ? ? ? ? 6 7 ? ? ? ? x ¯ y ¯ Cov ( x , y ) Var ( x )
Correlation Regression Calculations If x is continuous, calculation is more complicated: � β 1 = Cov ( x , y ) / Var ( x ) x ) 2 x i y i x i − ¯ x y i − ¯ y ( x i − ¯ x )( y i − ¯ y ) ( x i − ¯ − 2 . ¯ − 6 . 6¯ 1 1 6 -3 6 6.25 − 1 . ¯ 2 5 3 +1 − 2 . 00 2.25 − 0 . ¯ − 0 . 3¯ 3 3 6 -1 3 0.25 +0 . ¯ − 0 . 1¯ 4 6 3 +2 6 0.25 +1 . ¯ 5 2 6 -2 − 2 . 50 2.25 +2 . ¯ − 8 . 3¯ 6 7 3 +3 3 6.25 3 . ¯ 3.5 6 11 17.5
Correlation Regression Calculations If x is continuous, calculation is more complicated: � β 1 = Cov ( x , y ) / Var ( x ) = 11 / 17 . 5 = 0 . 627 x ) 2 x i y i x i − ¯ x y i − ¯ y ( x i − ¯ x )( y i − ¯ y ) ( x i − ¯ − 2 . ¯ − 6 . 6¯ 1 1 6 -3 6 6.25 − 1 . ¯ 2 5 3 +1 − 2 . 00 2.25 − 0 . ¯ − 0 . 3¯ 3 3 6 -1 3 0.25 +0 . ¯ − 0 . 1¯ 4 6 3 +2 6 0.25 +1 . ¯ 5 2 6 -2 − 2 . 50 2.25 +2 . ¯ − 8 . 3¯ 6 7 3 +3 3 6.25 3 . ¯ 3.5 6 11 17.5
Correlation Regression Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x
Correlation Regression Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x Intuition: OLS fit always runs through point (¯ x , ¯ y )
Correlation Regression Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x Intuition: OLS fit always runs through point (¯ x , ¯ y ) Ex.: ˆ β 0 = 3 . ¯ 6 − 0 . 627 ∗ 3 . 5 = 1 . 4¯ 6
Correlation Regression Intercept ˆ β 0 Simple formula: ˆ y − ˆ β 0 = ¯ β 1 ¯ x Intuition: OLS fit always runs through point (¯ x , ¯ y ) Ex.: ˆ β 0 = 3 . ¯ 6 − 0 . 627 ∗ 3 . 5 = 1 . 4¯ 6 y = 1 . 4¯ ˆ 6 + 0 . 6857ˆ x
Correlation Regression ¯ x y 7 6 5 ¯ y 4 3 2 1 x 0 1 2 3 4 5 6 7
Correlation Regression Systematic versus unsystematic components
Correlation Regression Systematic versus unsystematic components Systematic: Regression line (slope) Linear regression estimates the conditional means of the population data (i.e., E [ Y | X ])
Correlation Regression Systematic versus unsystematic components Systematic: Regression line (slope) Linear regression estimates the conditional means of the population data (i.e., E [ Y | X ]) Unsystematic: Error term is the deviation of observations from the line The difference between each value y i and ˆ y i is the residual : e i OLS produces an estimate of β that minimizes the residual sum of squares
Correlation Regression Why are there residuals?
Correlation Regression Why are there residuals? Fundamental randomness
Correlation Regression Why are there residuals? Fundamental randomness Measurement error
Correlation Regression Why are there residuals? Fundamental randomness Measurement error Omitted variables
Correlation Regression Minimum Mathematical Requirements 1 Do we need variation in X ?
Correlation Regression Minimum Mathematical Requirements 1 Do we need variation in X ? Yes, otherwise dividing by zero
Correlation Regression Minimum Mathematical Requirements 1 Do we need variation in X ? Yes, otherwise dividing by zero 2 Do we need variation in Y ? No, ˆ β 1 can equal zero ( Cor ( X , Y ) = 0)
Correlation Regression Minimum Mathematical Requirements 1 Do we need variation in X ? Yes, otherwise dividing by zero 2 Do we need variation in Y ? No, ˆ β 1 can equal zero ( Cor ( X , Y ) = 0)
Correlation Regression Minimum Mathematical Requirements 1 Do we need variation in X ? Yes, otherwise dividing by zero 2 Do we need variation in Y ? No, ˆ β 1 can equal zero ( Cor ( X , Y ) = 0) 3 How many observations do we need?
Correlation Regression Minimum Mathematical Requirements 1 Do we need variation in X ? Yes, otherwise dividing by zero 2 Do we need variation in Y ? No, ˆ β 1 can equal zero ( Cor ( X , Y ) = 0) 3 How many observations do we need? n ≥ k , where k is number of parameters to be estimated
Recommend
More recommend