bivariate relationships
play

Bivariate Relationships 17.871 2012 1 T Testing associ ti - PowerPoint PPT Presentation

Bivariate Relationships 17.871 2012 1 T Testing associ ti iati t ions (not causation!) Continuous data Scatter plot (always use first!) (Pearson) correlation coefficient (rare should be rarer!) (Pearson) correlation


  1. Bivariate Relationships 17.871 2012 1

  2. T Testing associ ti iati t ions (not causation!)  Continuous data  Scatter plot (always use first!)  (Pearson) correlation coefficient (rare should be rarer!)  (Pearson) correlation coefficient (rare, should be rarer!)  (Spearman) rank-order correlation coefficient (rare)  Regression coefficient (common)  Discrete data  Cross tabulations  χ 2   Gamma, Beta, etc. 2

  3. Conti t inuous DV, conti DV t inuous EV EV  Dependent Variable: DV  Explanatory (or independent) Variable: EV Explanatory (or independent) Variable: EV  Example: What is the relationship between E l Wh t i th l ti hi b t Black percent in state legislatures and black percent i t in st tat te popul lati tions 3

  4. Regression interpretation Regression interpretation Three key things to learn (today) 1. Where does regression come from 2. To interpret the regression coefficient To interpret the regression coefficient 3. To interpret the confidence interval We will l ill learn h how t to cal lcul lat te confid fidence  intervals in a couple of weeks 4

  5. Linear Relationship between African Linear Relationship between African American Population & Black Legislators beo Fitted values 10 Black % in state beo 5 legislatures legislatures 0 0 10 20 30 bpop Black % in state population Black % in state population 5

  6. The linear relationship between two The linear relationship between two variables Y     X    X Y 0 1 i i i Regression quantifies how one variable can be described in terms of another 6

  7. Linear Relationship between African Linear Relationship between African American Population & Black Legislators beo Fitted values 10 Black % in state beo 5 legislatures legislatures ^    1.31 1 31 0 0 ^ 0 10 20 30   1  0 359 0.359 bpop Black % in state population Black % in state population Y     X   0 1 i i i 7

  8. How did we get that line? 1. Pick a value of Y i Y i 10 Black % in eo 5 state b b legis. 0 0 10 20 30 bpop bpop Black % in state population Y     X   0 1 i i i 8

  9. How did we get that line? How did we get that line? 2. Decompose Y i into two parts 10 Black % in eo 5 state b b legis. 0 0 10 20 30 bpop bpop Black % in state population Y     X   0 1 i i i 9

  10. How did we get that line? How did we get that line? 3. Label the points Y i 10 Black % in eo 5 state b b legis. 0 0 10 20 30 bpop bpop Black % in state population Y  (    X )   0 1 i i i 10

  11. How did we get that line? How did we get that line? 3. Label the points Y i 10 ^ Y i Black % in eo 5 state b b legis. 0 0 10 20 30 bpop bpop Black % in state population Y  (    X )   0 1 i i i 11

  12. How did we get that line? How did we get that line? 3. Label the points Y i 10 ^ Y i Black % in eo 5 state b b legis. 0 0 10 20 30 bpop bpop Black % in state population Y  (    X )   0 1 i i i 12

  13. How did we get that line? How did we get that line? 3. Label the points Y i ^ 10 Y i -Y i ^ Y i Black % in eo 5 state b b legis. 0 0 10 20 30 bpop bpop Black % in state population Y  (    X )   0 1 i i i 13

  14. How did we get that line? How did we get that line? 3. Label the points Y i ^ 10 Y i -Y i ε i “residual” residual ^ Y i Black % in eo 5 state b b legis. 0 0 10 20 30 bpop bpop Black % in state population Y  (    X )   0 1 i i i 14

  15. What is ε i ? (sometimes u i )  Wrong functional form  Measurement error Measurement error  Stochastic component in Y  Unmeasured infl U d i fluences on Y Y Y     X   0 1 i i i 15

  16. The Method of Least Squares Th M t h d f L t S n and  to minimize   i  i 2 Pick  and Pick 1 to minimize 0 0 1 i  1 n  ˆ ) or ( Y i  Y 2 Y i Fitted values beo i i ^ ^ ε i Y i -Y i 10 i  1 ^ Y i beo 5 n  ( Y     X ) 2 0 0 1 i i 0 10 20 30 bpop i  1 Y     X   0 1 i i i 16

  17. n   ( Y     X ) 2   ( Y i S l f Solve for 0 1 i i  1  0   1 n  ( Y  Y i )( X  X i ) ^  1  i  1 or n   ( X  X ) 2 i i  1 cov( , ) X Y Remember this for the problem set! var( X ) 17

  18. Regressi ion commands i d in STATA STATA  reg depvar expvars  E.g., reg y x  E.g., reg beo bpop  Making predictions from regression lines  predict newvar  predict newvar, resid  newvar will now equal ε i 18

  19. Black elected officials example Black elected officials example . reg beo bpop Source | Source | SS SS df df MS MS Number of obs Number of obs = 41 41 -------------+------------------------------ F( 1, 39) = 202.56 Model | 351.26542 1 351.26542 Prob > F = 0.0000 Residual | 67.6326195 39 1.73416973 = 0.8385 R-squared -------------+------------------------------ Adj R-squared = 0.8344 Total | l | 18 898039 18.898039 0 0 10 10.472451 2 1 Root MSE S = 1 3169 1.3169 4 ------------------------------------------------------------------------------ beo | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- bpop | .3586751 .0251876 0.000 .4094219 14.23 .3075284 _cons | -1.314892 .3277508 -4.01 0.000 -1.977831 -.6519535 ------------------------------------------------------------------------------ Always include interpretation in your presentations and papers your presentations and papers Interpretation: a one percentage point increase in g black population leads to a .36 percentage point increase in black composition in the legislature 19

  20. The Linear Relationship between African The Linear Relationship between African American Population & Black Legislators beo Fitted values 10 Black % in state beo 5 legislatures ( Y ) 0  0   1.31 1 31 0 10 20 30 bpop Black % in state population ( X )  1  0 359  0.359             Y X Y X 0 1 0 1 i i i i i i 20

  21. More regression examples More regression examples 21

  22. Temperature and Latit T t d L t t i ude d 80 MiamiFL 60 LosAngelesCA Ph PhoenixAZ i AZ HoustonTX MobileAL SanFranciscoCA JanTemp DallasTX 40 MemphisTN Portland NorfolkVA J BaltimoreMD NewYorkNY WashingtonDC BostonMA KansasCityMO PittsburghPA ClevelandOH SyracuseNY 20 Minneapolis Minneapolis Dulu 0 25 30 35 40 45 latitude latitude scatter JanTemp latitude, mlabel(city) 22

  23. . reg jantemp latitude df Source | SS MS Number of obs = 20 -------------+------------------------------ F( 1, 49.34 18) = 1 Model | 3250.72219 3250.72219 Prob > F = 0.0000 Residual | 1185.82781 18 65.8793228 R-squared = 0.7327 -------------+------------------------------ Adj R-squared = 0.7179 Total | 4436.55 19 233.502632 Root MSE = 8.1166 ------------------------------------------------------------------------------ jantemp | jantemp | Coef. Coef. Std. Err. Std. Err. t t P>|t| P>|t| [95% Conf. Interval] [95% Conf. Interval] -------------+---------------------------------------------------------------- -3.041714 latitude | -2.341428 .3333232 -7.02 0.000 -1.641142 _cons | 125.5072 12.77915 9.82 0.000 98.65921 152.3552 ------------------------------------------------------------------------------ Interpretation: a one point increase in latitude is associated with a 2.3 decrease in average temperature (in Fahrenheit). Y     X   0 1 i i i 23

  24. How to add a regression line: St Stata command: lfit d lfit 80 MiamiFL 60 LosAngelesCA PhoenixAZ HoustonTX MobileAL SanFranciscoCA DallasTX a as 0 40 MemphisTN Portland NorfolkVA BaltimoreMD NewYorkNY WashingtonDC BostonMA KansasCityMO PittsburghPA ClevelandOH SyracuseNY 20 Mi MinneapolisM li M Dulu 0 25 30 35 40 45 latitude Fitted values JanTemp scatter JanTemp latitude, mlabel(city) || lfit JanTemp latitude or oft ften b better tt scatter JanTemp latitude, mlabel(city) m(i) || lfit JanTemp latitude 24

  25. Presenting regression results Presenting regression results Brief aside  First, show scatter plot  Label data points (if possible)  Label data points (if possible)  Include best-fit line  Second show regression table  Second, show regression table  Assess statistical significance with confidence interval or p value interval or p-value -  Assess robustness to control variables (internal validity y: nonrandom selection) 25

  26. Bush vote and B h t d South S t hern B Bapti t t ists UT .7 WY ID NE OK ND AL KS AK TX .6 SD IN Pct 2004 KY MS MT SC GA LA TN WV NC AZ AR Bush P VA VA MO MO FL CO OH NV .5 IA NM WI NH PA MI MN OR NJ DE WA HI ME IL CA CT MD MD .4 NY VT RI MA 0 .2 .4 .6 Southern Baptist % S th B ti t % Bush Fitted values 26

Recommend


More recommend