comparing two samples
play

Comparing two samples Edwin Leuven Introduction We will now study - PowerPoint PPT Presentation

Comparing two samples Edwin Leuven Introduction We will now study the relationship between 2 variables X and Y We consider the case when X is binary men/women, high school/college, city/countryside, treated/untreated and call X the


  1. Comparing two samples Edwin Leuven

  2. Introduction We will now study the relationship between 2 variables X and Y We consider the case when X is binary ◮ men/women, high school/college, city/countryside, treated/untreated and call X the explanatory variable. Y can be binary, continuous or have some other cardinality ◮ income, education, health and call Y the outcome variable. 2/38

  3. Introduction – Between group differences We will focus today on differences in averages F.e., the difference in average income between men and women ◮ in the population E [ income | men ] − E [ income | women ] ◮ in the sample � � 1 1 income i − income i n men n women i : men i : women = income men − income women Such differences are always descriptive differences We are also often interested in causal differences 3/38

  4. Introduction – Sample Average Causal Effect We refer to causal differences as “effects”, f.e. ◮ “What is the effect of college on earnings?” With the following potential outcomes ◮ y 1 i = earnings with college ◮ y 0 i = earnings without without college we can define the sample average treatment effect (SATE) n � SATE = 1 ( y 1 i − y 0 i ) n i =1 4/38

  5. Introduction – Fundamental problem of causal inference The observed outcome (earnings) equals y i = x i y 1 i + (1 − x i ) y 0 i depending on whether i went to college ( x i = 1) or not ( x i = 0) For a given person i the causal effect of x i on y i ◮ equals: y 1 i − y 0 i ◮ is not identified: only 1 potential outcome is observed (depending on x i ) Q: How do we compute the SATE given that we only see y i ? 5/38

  6. Introduction – Confounders How do we estimate SATE = y 1 − y 0 with observed earnings y i ? We can compare average observed earnings of people with college and people without college � SATE = y college − y no college = y 1college − y 0no college Does this give the SATE? Only if E [ y 0 | college] = E [ y 0 | no college] E [ y 1 | college] = E [ y 1 | no college] college is randomly assigned! 6/38

  7. Introduction – RCT’s In practice we think that college graduates are different from people without college in ways that matter for their earnings, f.e. E [ ability | x = 1] � = E [ ability | x = 0] If ability also affects y , then we can no longer attribute differences in y to differences in x alone! This is why randomized control trials (RCT’s) are considered the “gold standard” when estimating causal effects: ◮ randomization equalizes the treatment and control group on pre-existing characteristics such as ability 7/38

  8. Introduction – RCT’s This balancing properting of RCT’s is good for internal validity ◮ internal validity = � SATE can be given a causal interpretation in the sample ◮ but watch out for: ◮ Hawthorne effects (somebody’s watching me!) ◮ John Henry effects (am gonna show you!) ◮ Attrition Bias (let’s get outta here!) The external validity of RCT’s is sometimes less clear ◮ external validity = is the SATE estimate valid for other samples? 8/38

  9. Two-sample estimator But for now assume that x is randomized by a coin flip We want to test the following H 0 : E [ y | x = 1] = E [ y | x = 0] vs H 1 : E [ y | x = 1] � = E [ y | x = 0] or H 0 : θ = 0 vs H 1 : θ � = 0 where θ = E [ y | x = 1] − E [ y | x = 0] We estimate θ with the corresponding sample average ˆ θ = y x =1 − y x =0 9/38

  10. Example – Social Pressure Experiment August 2006 Primary Statewide Election in Michigan Send postcards with different (randomly assigned) messages 1. no message (control group) 2. civic duty message 3. “you are being studied” message (Hawthorne effect) 4. neighborhood social pressure message 10/38

  11. Example – Social Pressure Experiment 11/38

  12. Example – Social Pressure Experiment, Balance social = read.csv ("_data/social.csv") with (social, tapply (primary2004, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 0.399 0.400 0.403 0.407 with (social, tapply (hhsize, messages, mean)) ## Civic Duty Control Hawthorne Neighbors ## 2.19 2.18 2.18 2.19 12/38

  13. Example – Social Pressure Experiment, Effects m06 = with (social, tapply (primary2006, messages, mean)) m06 ## Civic Duty Control Hawthorne Neighbors ## 0.315 0.297 0.322 0.378 m06 - m06["Control"] ## Civic Duty Control Hawthorne Neighbors ## 0.0179 0.0000 0.0257 0.0813 13/38

  14. Example – Social Pressure Experiment Turnout rate: Y T = 0 . 38, Y C = 0 . 30, Sample size: n T = 360, n C = 1890 Estimated average treatment effect: � ATE = Y T − Y C = 0 . 08 How to compute the 95% CI of ˆ θ and perform a test of H 0 ? 14/38

  15. One-Sample CI and t-Test First remember how we computed the CI and perform the t-test for a single average ¯ x ? By the CLT we know that in large samples approx ¯ X ∼ N ( E [ X ] , Var( X ) / n ) and � CI 95% = � � E [ X ] ± 1 . 96 × Var( X ) / n = ¯ X ± 1 . 96 × SE where � � 1 x ) 2 / n SE = ( x i − ¯ ( n − 1) and ¯ X − E [ X ] t = ∼ t ( n − 1) SE 15/38

  16. Two-Sample CI and t-Test Our estimator is now the difference between 2 sample averages: ˆ θ = y x =1 − y x =0 How is ˆ θ distributed? Since approx N ( µ 1 , σ 2 ∼ y x = k ¯ k / n k ) where µ k = E [ Y | X = k ] and σ 2 k = Var( Y | X = k ) we have that approx ˆ N ( µ 1 − µ 0 , σ 2 1 / n 1 + σ 2 θ ∼ 0 / n 0 ) which follows from the following result. 16/38

  17. Sums of normal random variables Sums of normal random variables If X k ∼ N ( µ k , σ 2 k ) k = 1 , 2 where µ k = E [ X k ] and σ 2 k = Var( X k ), then X 1 + X 2 ∼ N ( E [ X 1 + X 2 ] , Var( X 1 + X 2 )) ∼ N ( µ 1 + µ 2 , σ 2 1 + σ 2 1 + 2 σ 12 ) where σ 12 = Cov ( X 1 , X 2 ), and σ 12 = 0 if X 1 & X 2 are independent. 17/38

  18. Example – Social Pressure Experiment Turnout rate: Y T = 0 . 38, Y C = 0 . 30, Sample size: n T = 360, n C = 1890 Estimated average treatment effect: � ATE = Y T − Y C = 0 . 07 Standard error: � Y T (1 − Y T ) + Y C (1 − Y C ) SE = = 0 . 028 n T n C 95% Confidence intervals based on CLT: ( � � ATE − SE × z 0 . 975 , ATE + SE × z 0 . 025 ) = (0 . 026 , 0 . 134) 18/38

  19. t-Test – Large samples Under H 0 : θ = 0 our test-statistic now becomes ˆ θ − 0 y x =1 − y x =0 − 0 t = = � SE (ˆ θ ) σ 2 σ 2 ˆ x =1 / n 1 + ˆ x =0 / n 0 σ 2 where ˆ x = k is the sample variance of y i for the x = k group: � 1 σ 2 ( y i − y x = k ) 2 ˆ x = k = n k − 1 i : x i = k Under H 0 : θ = 0 and in large samples approx ∼ N (0 , 1) t 19/38

  20. t-Test – Small samples In small samples approx t ∼ t ( k ) where ( σ 2 x =1 / n 1 + σ 2 x =0 / n 0 ) 2 k ≈ σ 4 x =1 / ( n 2 1 ( n 1 − 1)) + σ 4 x =0 / ( n 2 0 ( n 0 − 1)) when σ 2 x =1 = σ 2 x =0 we get that k = n − 2. 20/38

  21. Example – Social Pressure Experiment Turnout rate: Y T = 0 . 38, Y C = 0 . 30, Sample size: n T = 360, n C = 1890 Estimated average treatment effect: � ATE = Y T − Y C = 0 . 07 Standard error: � Y T (1 − Y T ) + Y C (1 − Y C ) SE = = 0 . 028 n T n C T-statistic: t = 0 . 08 0 . 028 ≈ 2 . 9 21/38

  22. Example – Social Pressure Experiment, Two-sample test We have the following reference distribution under H 0 : p T = p C N (0 , p (1 − p ) + p (1 − p ) ) n T n C p = (.38 * 360 + .3 * 1890) / (360 + 1890) se0 = sqrt (p * (1 - p) / 360 + p * (1 - p) / 1890) p; se0 ## [1] 0.313 ## [1] 0.0267 t = 0 . 07 0 . 028 = 2 . 9 gives a p -value of: 2 * pnorm (.08 / se0, lower.tail = F) # p-value ## [1] 0.00269 22/38

  23. Power We reject the null if the test statistic is “too large” to be consistent with our null hypothesis: � if | t | > c reject H 0 decision = do not reject H 0 if | t | ≤ c H 0 is true H 0 is false Not reject H 0 Correct Type II error probability 1 − α probability β Reject H 0 Type I error Correct probability 1 − β probability α Hypothesis tests control the probability of Type I error, which is equal to the level of tests or α They do not control the probability of Type II error 23/38

  24. Power Null hypotheses are often uninteresting But, hypothesis testing may indicate the strength of evidence for or against your theory Our ability to discrimintate between H 0 and H 1 is measured by power: power = 1 − Pr(Type II error) = 1 − β A large p -value can occur either because H 0 is true or because H 0 is false but the test is not powerful. There is a tradeoff between the two types of error, but typically, we want a most powerful test given the level 24/38

  25. Power Analysis Power analysis: 1. Choose H 0 , H 1 , α ◮ f.e. H 0 : µ = µ 0 , H 1 : µ = µ 1 , α = 0 . 05 ◮ µ = µ 1 which implies X ∼ N ( µ 1 , V ( X ) / n ) 2. Choose population parameter under hypothetical data generating process 3. Fix either 3.1 the sample size, and compute power ◮ we reject H 0 if | X | > µ 0 + z α/ 2 × SE 3.2 the desired power, and compute required sample size ◮ fix the probability in 3a. and solve for n 25/38

  26. Power – The Big Picture Distribution Area = under H 0 Pr(Type II error) Distribution under H 1 Density Critical Level under Critical Level 26/38

  27. Low Power Distribution Area = under H 0 Pr(Type II error) Density Distribution under H 1 z 1 −α 2 27/38

  28. High Power Distribution Distribution Area = under H 0 under H 1 Pr(Type II error) Density z 1 −α 2 28/38

Recommend


More recommend