gov 2000 6 hypothesis testing
play

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 - PowerPoint PPT Presentation

Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6. Exact Inference* 7. Wrap up 2 / 55


  1. Gov 2000: 6. Hypothesis Testing Matthew Blackwell October 11, 2016 1 / 55

  2. 1. Hypothesis Testing Examples 2. Hypothesis Test Nomenclature 3. Conducting Hypothesis Tests 4. p-values 5. Power Analyses 6. Exact Inference* 7. Wrap up 2 / 55

  3. Where are we? Where are we going? population parameter, drawing on our knowledge of probability. values of the parameter in the confjdence interval. about the data. the term! 3 / 55 • Last few weeks = how to produce a best estimate of some • Also learned how to derive an estimated range of plausible • Now: how to use our estimates to test a particular hypothesis • We’ll draw heavily on our probability knowledge from earlier in

  4. 1/ Hypothesis Testing Examples 4 / 55

  5. The lady tasting tea Your advisor asks you to grab a tea with milk for him before your meeting and he says that he prefers tea poured before the milk. You stop by Darwin’s and ask for a tea with milk. When you bring it to your advisor, he complains that it was prepared milk-fjrst. devise a test: 5 / 55 • Remember the setup: • You are skeptical that he can really tell the difgerence, so you ▶ Prepare 8 cups of tea, 4 milk-fjrst, 4 tea-fjrst ▶ Present cups to advisor in a random order ▶ Ask advisor to pick which 4 of the 8 were milk-fjrst.

  6. Assuming we know the truth correct if she were guessing randomly? probability. 1 Another testing example 6 / 55 • Advisor picks out all 4 milk-fjrst cups correctly! • Statistical thought experiment: how often would she get all 4 ▶ Only one way to choose all 4 correct cups. ▶ But 70 ways of choosing 4 cups among 8. ▶ Choosing at random ≈ picking each of these 70 with equal • Chances of guessing all 4 correct is 70 ≈ 0.014 or 1.4%. • ⇝ the guessing at random hypothesis might be implausible.

  7. Social pressure effect 7 / 55

  8. Social pressure effect load("../data/gerber_green_larimer.RData") "Neighbors"]) "Civic Duty"]) neigh.mean - contr.mean ## [1] 0.0634 due to random chance. treatment efgect at all? 8 / 55 social$voted <- 1 * (social$voted == "Yes") neigh.mean <- mean(social$voted[social$treatment == contr.mean <- mean(social$voted[social$treatment == • Treatment efgect of 6.341 percentage points. • But we know that the estimator varies from sample to sample • Could this happen by random chance if there was no

  9. Review of the difference in means ̂ 𝑜 𝑦 𝑦 𝑜 𝑧 𝑧 se [̂ 9 / 55 and population variance 𝜏 2 𝑦 𝑧 and population variance 𝜏 2 • Treated group 𝑍 1 , 𝑍 2 , … , 𝑍 𝑜 𝑧 i.i.d. with population mean 𝜈 𝑧 • Control group 𝑌 1 , 𝑌 2 , … , 𝑌 𝑜 𝑦 i.i.d. with population mean 𝜈 𝑦 • Quantity of interest: population difgerences in average turnout: 𝔽[𝑍 𝑗 ] − 𝔽[𝑌 𝑗 ] = 𝜈 𝑧 − 𝜈 𝑦 • Estimator: sample difgerence in means: ̂ 𝐸 𝑜 = 𝑍 𝑜 𝑧 − 𝑌 𝑜 𝑦 • We estimated the standard error of ̂ 𝐸 𝑜 with: + 𝑇 2 𝐸 𝑜 ] = √𝑇 2

  10. 2/ Hypothesis Test Nomenclature 10 / 55

  11. What is a hypothesis test? about the population distribution. see under this assumption. under it. 11 / 55 • A hypothesis test is an evaluation of a particular hypothesis • Statistical thought experiments: ▶ Assume we know (part of) the true DGP.. ▶ Use tools of probability to see what types of data we should ▶ Compare our observed data to this thought experiment. • Statistical proof by contradiction: ▶ We will “reject” the assumed DGP if the data is too unusual

  12. What is a hypothesis? parameters. turnout higher in social pressure group compared to Civic Duty group?) issues? (voting behavior difgerent among members of Congress with daughters?) treaty signers?) 12 / 55 • Defjnition A hypothesis is just a statement about population • We might have hypotheses about causal inferences: ▶ Does social pressure induce higher voter turnout? (mean ▶ Do daughters cause politicians to be more liberal on women’s ▶ Do treaties constrain countries? (behavior difgerent among • We might also have hypotheses about other parameters: ▶ Is the share of Hillary Clinton supporters more than 50%? ▶ Are traits of treatment and control groups difgerent?

  13. Null and alternative hypotheses value for a population parameter. hypothesis is the research claim we are interested in supporting. 13 / 55 • Defjntion The null hypothesis is a proposed, conservative ▶ This is usually “no efgect/difgerence/relationship.” ▶ We denote this hypothesis as 𝐼 0 ∶ 𝜄 = 𝜄 0 . ▶ 𝐼 0 : Social pressure doesn’t afgect turnout ( 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 ) • Defjnition The alternative hypothesis for a given null ▶ Usually, “there is a relationship/difgerence/efgect.” ▶ We denote this as 𝐼 𝑏 ∶ 𝜄 ≠ 𝜄 0 . ▶ 𝐼 𝑏 : Social pressure afgects turnout ( 𝐼 𝑏 ∶ 𝜈 𝑧 − 𝜈 𝑦 ≠ 0 ) • Always mutually exclusive

  14. General framework hypothesis based on the data we observe. 𝑈 under the null. 14 / 55 • A hypothesis test chooses whether or not to reject the null • Rejection based on a test statistic, 𝑈 𝑜 = 𝑈(𝑍 1 , … , 𝑍 𝑜 ) . ▶ Will help us adjudicate between the null and the alternative. ▶ Typically: larger values of 𝑈 𝑜 ⇝ null less plausible. ▶ A test statistic is a r.v. • Defjnition The null/reference distribution is the distribution of ▶ We’ll write its probabilities as ℙ 0 (𝑈 𝑜 ≤ 𝑢) .

  15. Test statistic example ̂ population difg-in-means is not plausible. → 𝑂(0, 1) 𝑒 𝐸 𝑜 ] se [̂ ̂ 𝐸 𝑜 15 / 55 → 𝑂(0, 1) 𝑒 𝐸] se [̂ ̂ ̂ means has a standard normal distribution in large samples: • By the CLT, we know that the standardized difgerence in 𝐸 𝑜 − (𝜈 𝑧 − 𝜈 𝑦 ) 𝑈 𝑜 = • Under the null hypothesis of 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 , then we have 𝑈 𝑜 = • If 𝑈 𝑜 is very far from 0 ⇝ large sample difg-in-means ⇝ no

  16. Rejection regions for which we reject the null. the null. null 16 / 55 • Defjnition The rejection region, 𝑆 , contains the values of 𝑈 𝑜 ▶ These are the areas that indicate that there is evidence against • Two-sided alternative (our focus): ▶ 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 and 𝐼 𝑏 ∶ 𝜈 𝑧 − 𝜈 𝑦 ≠ 0 ▶ Implies that 𝑈 𝑜 >> 0 or 𝑈 𝑜 << 0 will be evidence against the ▶ Rejection regions: |𝑈 𝑜 | > 𝑑 for some value 𝑑 • How to determine these regions?

  17. Type I and Type II errors Type I errors A Type I error is when we reject the null hypothesis when it is in fact true. Type II errors A Type II error is when we fail to reject the null hypothesis when it is false. discerning. 17 / 55 • We say that the Lady is discerning when she is just guessing. • A false discovery (very bad, thus type I). • We say that the Lady is just guessing when she is truly • An undetected fjnding (not as bad, thus type II).

  18. Test level/size Good stufg! to discovery 1,750,000 1 there a Type I error. 18 / 55 Type I error Reject 𝐼 0 Type II error Awesome! Retain 𝐼 0 𝐼 0 True 𝐼 0 False • Defjntion The level/size of the test, or 𝛽 , is the probability of ▶ With two-sided alternative, we reject when |𝑈 𝑜 | > 𝑑 ▶ Size of test then is: ℙ 0 (|𝑈 𝑜 | > 𝑑) = 𝛽 • Choose a level 𝛽 based on aversion to false discovery: ▶ Convention in social sciences is 𝛽 = 0.05 , but nothing magical ▶ Particle physicists at CERN use 𝛽 ≈ ▶ Lower values of 𝛽 guard against “fmukes” but increase barriers

  19. 3/ Conducting Hypothesis Tests 19 / 55

  20. Hypothesis testing procedure 1. Choose null and alternative hypotheses 2. Choose a test statistic, 𝑈 𝑜 3. Choose a level, 𝛽 4. Determine rejection region 20 / 55 5. Reject if 𝑈 𝑜 in rejection region, fail to reject otherwise

  21. Rejection region the rejection region only 5% of the time. normal! 21 / 55 0.5 Reject Reject Retain 0.4 0.3 P 0 ( T ) 0.2 0.1 0.0 -c c -4 -2 0 2 4 T under the null hypothesis • What’s the rejection region |𝑈 𝑜 | > 𝑑 if 𝛽 = 0.05 ? • Under the null hypothesis of no efgect, we want 𝑈 𝑜 to be in ▶ ⇝ false rejection of the null only 5% of the time. ▶ Can fjnd 𝑑 based on the null distribution being ≈ standard

  22. Determining the rejection region 22 / 55 0.5 Reject Reject Retain 0.4 0.3 P 0 ( T ) 0.2 0.1 α 2 α 2 0.0 − c = z α 2 c = z α 2 -4 -2 0 2 4 T under the null hypothesis • Find 𝑨 𝛽/2 such that ℙ 0 (𝑈 𝑜 < −𝑨 𝛽/2 ) = ℙ 0 (𝑈 𝑜 > 𝑨 𝛽/2 ) = 𝛽/2

  23. Determining the rejection region 23 / 55 0.5 Reject Reject Retain 0.4 0.3 P 0 ( T ) 0.2 0.1 1 − α 2 α 2 0.0 − c = − z α 2 c = z α 2 -4 -2 0 2 4 T under the null hypothesis • Find 𝑨 𝛽/2 such that ℙ 0 (𝑈 𝑜 < −𝑨 𝛽/2 ) = ℙ 0 (𝑈 𝑜 > 𝑨 𝛽/2 ) = 𝛽/2 • ⇝ fjnd quantile ℙ 0 (𝑈 𝑜 < 𝑨 𝛽/2 ) = 1 − 𝛽/2 ▶ if 𝛽 = 0.05 ⇝ 𝑨 𝛽/2 = qnorm(1-0.05/2) = 1.96

  24. Final hypothesis test 𝐸 𝑜 /̂ se [̂ 𝐸 𝑜 ] 3. Use 𝛽 = 0.05 4. Rejection region is |𝑈 𝑜 | > 1.96 . 24 / 55 1. Hypotheses: 𝐼 0 ∶ 𝜈 𝑧 − 𝜈 𝑦 = 0 vs. 𝐼 𝑏 ∶ 𝜈 𝑧 − 𝜈 𝑦 ≠ 0 2. Test statistic: 𝑈 𝑜 = ̂

Recommend


More recommend