lecture 1 introduction to regression an example
play

Lecture 1: Introduction to Regression An Example: Explaining State - PowerPoint PPT Presentation

Lecture 1: Introduction to Regression An Example: Explaining State Homicide Rates What kinds of variables might we use to explain/predict state homicide rates? Lets consider just one predictor for now: poverty Ignore omitted


  1. Lecture 1: Introduction to Regression

  2. An Example: Explaining State Homicide Rates  What kinds of variables might we use to explain/predict state homicide rates?  Let’s consider just one predictor for now: poverty Ignore omitted variables, measurement error  How might this be related to homicide rates? 

  3. Poverty and Homicide  These data are located here: http://www.public.asu.edu/~gasweete/crj604/data/hom_pov.dta   Download these data and create a scatterplot in Stata.  Does there appear to be a relationship between poverty and homicide? What is the correlation?

  4. Scatterplots and correlations Scatterplots with correlations of a) +1.00; b) – 0.50; c) +0.85; and d) +0.15.

  5. Poverty and Homicide  There appears to be some relationship between poverty and homicide rates, but it’s not perfect.  But there is a lot of “noise” which we will attribute to unobserved factors and random error.

  6. Poverty and Homicide, cont.  There is some nonzero value of expected homicides in the absence of  poverty. ( ) 0  We expect homicide rates to increase  as poverty rates increase. ( ) 1      Thus, Y X 0 1  This is the Population Regression Function

  7. Poverty and Homicide, Sample Regression Function ˆ ˆ      y x u i 0 1 i i  y i is the dependent variable, homicide rate, which we are trying to explain. ˆ   represents our estimate of what the homicide 0 rate would be in the absence of poverty* ˆ   is our estimate of the “effect” of a higher 1 poverty rate on homicide  u i is a “noise” term reflecting other things that influence homicide rates *This is extrapolation outside the range of data. Not recommended.

  8. Poverty and Homicide, cont. ˆ ˆ      y x u i 0 1 i i  Only y i and x i are directly observable in the equation above. The task of a regression analysis is to provide estimates of the slope and intercept terms.  The relationship is assumed to be linear. An increase in x is associated with an increase in y . Same expected change in homicide going from 6  to 7% poverty as from 15 to 16%

  9. . twoway (scatter homrate poverty) (lfit homrate poverty)      .973 0.475 0 1

  10. Ordinary Least Squares     y .973 .475 x u i i i Substantively, what do these estimates mean?  -.973 is the expected homicide rate if poverty rates were  zero. This is never the case, except perhaps in the case of a zombie apocalypse, so it’s not a meaningful estimate. .475 is the effect of a 1 unit increase in the poverty rate on  the homicide rate. You need to know how you are measuring poverty. In this case, 1 unit increase is an increase of 1 percentage point. So a 1 percentage point increase (not “percent increase”)  in the poverty rate is associated with an increase of .475 homicides per 100,000 people in the state. In AZ, this would be ~31 homicides. 

  11. Ordinary Least Squares     y .973 .475 x u i i i  How did we arrive at this estimate? Why did we draw the line exactly where we did? Minimize the sum of the “squared error”, aka  Ordinary Least Squares (OLS) estimation n  ˆ  2 min ( Y Y ) i i  i 1  Why squared error?  Why vertical error? (Not perpendicular).

  12. Ordinary Least Squares Estimates n  ˆ ˆ     2 min ( ( ) y x i 0 1 i  1 i  Solving for the minimum requires calculus (set derivative with respect to β to 0 and solve)  The book shows how we can go from some basic assumptions to estimates for β 0 and β 1 without using calculus.  I will go through two different ways to obtain these estimates: Wooldridge’s and Khan’s (khanacademy.org)

  13. Ordinary Least Squares: Estimating the intercept (Wooldridge’s method)  E u ( ) 0  Assuming that the average      value of the u y x 0 1 error term is      zero, it is a ( ) 0 E y x trivial matter to 0 1 calculate β 0 ˆ ˆ      y x 0 once we know 0 1 β 1. ˆ ˆ     y x 0 1

  14. Ordinary Least Squares: Estimating the intercept (Wooldridge)  Incidentally, these last sets of equations also imply that the regression line passes through the point that corresponds to the mean of x and   the mean of y: x , y ˆ ˆ     y x 0 1 ˆ ˆ     y x 0 1

  15. Ordinary Least Squares: Estimating the slope (Wooldridge) First, we use the fact   E ( u ) 0 that the expected value of the error term ˆ ˆ      y x u is zero, to create i 0 1 i i generate a new ˆ ˆ      equation equal to u y x i i 0 1 i zero. n  We saw this before,  ˆ ˆ       1 n ( y x ) 0 but here I use the 0 1 i i exact formula used in  i 1 the book.

  16. Ordinary Least Squares: Estimating the slope (Wooldridge)   Cov ( x , u ) E ( xu ) 0 We can multiply this  last equation by x i n  ˆ ˆ       since the 1 n x ( y x ) 0 i i 0 1 i covariance between  i 1 x and u is assumed to be zero and the n  ˆ ˆ        1 ( ( ) ) 0 n x y y x x terms in the i i 1 1 i parentheses are  i 1 equal to u . n  ˆ ˆ       Next, we plug in our  x ( y y x x ) 0 formula for the i i 1 1 i  1 i intercept and simplify

  17. Ordinary Least Squares: Estimating the slope (Wooldridge) n  ˆ ˆ       x ( y y x x ) 0 Re-arranging . . .  i i 1 1 i  1 i n n   ˆ ˆ       x ( y y ) x ( x x ) 0 i i i 1 1 i   i 1 i 1 n n   ˆ      x ( y y ) x ( x x ) 0 1 i i i i   i 1 i 1 n n   ˆ     ( ) ( ) x y y x x x i i 1 i i   i 1 i 1

  18. Ordinary Least Squares: Estimating the slope (Wooldridge) Re-arranging . . .  n n     ˆ      Interestingly, the  2 x x ( y y ) ( x x ) final result leads us i i 1 i   i 1 i 1 to the relationship between covariance n      x x ( y y ) of x and y and i i cov( x , y ) variance of x. ˆ     i 1 1 n  var( ) x  2 ( x x ) i  i 1

  19. Ordinary Least Squares: Estimates (Khan’s method) Khan starts with the  actual points, and elaborates how these points are related to the squared error, the square of the distance between each point ( x n ,y n ) and the line y=mx+b= β 1 x+ β 0

  20. Ordinary Least Squares: Estimates (Khan’s method) The vertical distance between any point ( x n ,y n ), and the  regression line y= β 1 x+ β 0 is simply y n -( β 1 x n + β 0 )                  Total Error ( y ( x )) ( y ( x )) ( y ( x )) 1 1 1 0 2 1 2 0 n 1 n 0 It would be trivial to minimize the total error. We could set  β 1 (the slope) equal to zero, and β 0 equal to the mean of y, and then the total error would be zero. Another approach is to minimize the absolute difference ,  but this actually creates thornier math problems than squaring the differences, and results in situations where there is not a unique solution. In short, what we want is the sum of the squared error  (SE), which means we have to square every term in that equation.

  21. Ordinary Least Squares: Estimates (Khan’s method)                 2 2 2  SE ( y ( x )) ( y ( x )) ( y ( x )) 1 1 1 0 2 1 2 0 n 1 n 0 We need to find the β 1 and β 0 that minimize the SE. Let’s  expand this out. To be clear, the subscripts for the β estimates just refer to  our two regression line estimates, whereas the subscripts for our x’s and y’s refer to the first observation, second observation and so on.                    2 2 2 2  SE ( y 2 y ( x ) ( x ) ) ( y 2 y ( x ) ( x ) ) 1 1 1 1 0 1 1 0 n n 1 n 0 1 n 0              2 2 2 2  y 2 y x 2 y x 2 x 1 1 1 1 1 0 1 1 1 1 0 0             2 2 2 2 y 2 y x 2 y x 2 x n n 1 n n 0 1 n 1 n 0 0

Recommend


More recommend