Lecture 4: Hypothesis Testing Ani Manichaikul amanicha@jhsph.edu 20 April 2007 1 / 69
Steps of Hypothesis Testing Define the null hypothesis, H 0 Define the alternative hypothesis, H a , where H a is usually of the form “not H 0 ” Define the type I error, α , usually 0.05 Calculate the test statistic Calculate the p-value If the p-value is less than α , reject H 0 Otherwise, fail to reject H 0 2 / 69
Hypothesis Testing We will first discuss hypothesis testing as it applies to means of distributions for continuous variables We will then discuss discrete data (specifically dichotomous variables) 3 / 69
Hypothesis test for a single mean I Assume a population of normally distributed birth weights with a known standard deviation, σ = 1000 grams Birth weights are obtained on a sample of 10 infants; the sample mean is calculated as 2500 grams Question: Is the mean birth weight in this population different from 3000 grams? Set up a two-sided test of : µ = 3000 H 0 vs. H a : µ � = 3000 Let α = 0 . 05 denote a 5% significance level 4 / 69
Hypothesis test for a single mean II Calculate the test statistic: ¯ X − µ 0 σ/ √ n = 2500 − 3000 √ z obs = = − 1 . 58 1000 / 10 What does this mean? Our observed mean is 1.58 standard errors below the hypothesized mean The test statistic is the standardized value of our data assuming the null hypothesis is true! Question: If the true mean is 3000 grams, is our observed sample mean of 2500 “common” or is this value unlikely to occur? 5 / 69
Hypothesis test for a single mean III Calculate the p-value: p-value = P ( Z < −| z obs | )+ P ( Z > | z obs | ) = 2 × 0 . 057 = 0 . 114 If the true mean is 3000 grams, our data or data more extreme than ours would occur in 11 out of 100 studies (of the same size, n=10) In 11 out of 100 studies, just by chance we are likely to observe a sample mean of 2500 or more extreme if the true mean is 3000 grams What does this say about our hypothesis? General guideline: if p-value < α , then reject H 0 6 / 69
Hypothesis test for a single mean IV Could also use the “critical region” or “rejection region” approach Based on our significance level ( α = 0 . 05) and assuming H 0 is true, how “far” does our sample mean have to be from H 0 : µ = 3000 in order to reject? Critical value = z c where 2 × P ( Z > | z c | ) = 0 . 05 In our example, z c = 1 . 96 The rejection region is any value of our test statistic that is less than -1.96 or greater than 1.96 Decision should be the same whether using the p-value or critical / rejection region 7 / 69
Hypothesis test for a single mean V An alternative approach for the two sided hypothesis test is to calculate a 100(1- α )% confidence interval for the mean We are 95% confident that the interval (1880, 3120) contains the true population mean µ σ → 2500 ± 1 . 961000 ¯ √ √ X ± z α/ 2 10 10 The hypothetical true mean 3000 is a plausible value of the true mean given out data We cannot say that the true mean is di ff erent from 3000 8 / 69
P-values Definition: The p-value for a hypothesis test is the null probability of obtaining a value of the test statistic as or more extreme than the observed test statistic The rejection region is determined by α , the desired level of significance, or probability of committing a type I error Reporting the p-value associated with a test gives an indication of how common or rare the computed value of the test statistic is, given that H 0 is true We often use z obs to denote the computed value of the test statistic 9 / 69
Determining the correct test statistic Depends on your assumptions on σ When σ is known, we have a standard normal test statistic When σ is unknown and our sample size is relatively small, the test statistic has a t-distribution The only chance in the procedure is the calculation of the p-value or rejection region uses a t- instead of normal distribution 10 / 69
Hypothesis tests for one mean H 0 : µ = µ 0 , H a : µ � = µ 0 Population Sample Population Test Distribution Size Variance Statistic ¯ σ 2 known X − µ 0 Any z obs = σ/ √ n Normal ¯ σ 2 unknown X − µ 0 Any t obs = s / √ n uses s 2 , df=n-1 ¯ σ 2 known X − µ 0 Not Normal/ Large z obs = σ/ √ n ¯ s 2 unknown X − µ 0 Large z obs = s / √ n Unknown uses s 2 Small Any Non-parametric methods 11 / 69
Hypothesis tests for one proportion H 0 : p = p 0 , H a : p � = p 0 Population Sample Test Distribution Size Statistic ˆ p − p 0 Large z obs = Binomial q p 0(1 − p 0) n Small Exact methods 12 / 69
Hypothesis tests for a difference of two means H 0 : µ 1 − µ 2 = µ 0 , H a : µ 1 − µ 2 � = µ 0 Population Sample Population Test Distribution Size Variances Statistic z obs = ( ¯ X 1 − ¯ X 2 ) − µ 0 Any Known r σ 2 σ 2 1 2 n 1 + n 2 Normal Any unknown t obs = ( ¯ X 1 − ¯ X 2 ) − µ 0 assume σ 2 1 = σ 2 2 , r s 2 s 2 p p n 1 + df = n 1 + n 2 − 2 n 2 Any unknown t obs = ( ¯ X 1 − ¯ X 2 ) − µ 0 assume σ 2 1 � = σ 2 2 , r s 2 s 2 1 2 n 1 + df = ν n 2 13 / 69
Example: Hypothesis test for two means (two independent samples) I The EPREDA Trial: randomized, placebo-controlled trial to determine whether dipyridamole improves the efficacy of aspirin in preventing fetal growth retardation Pregnant women randomized to placebo (n=73), aspirin or aspirin plus dipyridamole (n=156) Mean birth weight was statistically significantly higher in the treated than in the placebo group 2751 (SD 670) grams vs. 2526 (SD 848) grams 14 / 69
Example: Hypothesis test for two means (two independent samples) II Test the hypothesis: H 0 : µ placebo = µ treated vs. H a : µ placebo � = µ treated at the 5% significance level The data are: Treatment n mean SD Placebo 73 2526 848 Treated 156 2751 670 15 / 69
Example: Hypothesis test for two means (two independent samples) III Calculate the test statistic: t obs = ( ¯ X 1 − ¯ X 2 ) − µ 0 = 2526 − 2751 = − 1 . 99 � � n p + s 2 s 2 848 2 73 + 676 2 1 2 n t 156 The observed di ff erence in mean birth weight comparing the placebo to treated groups is approximately 2 standard errors below the hypothesized di ff erence of 0 Our sample size is pretty large, so the test statistic will behave like a standard normal variable 16 / 69
Example: Hypothesis test for two means (two independent samples) IV What is the p-value in this example? p-value= 0 . 047 What is your decision in this case? Not straightforward There may be a di ff erence in birth weight comparing the two groups Need to consider the practical implications 17 / 69
Example: Hypothesis test for two means (two independent samples) V Can also give 95% confidence interval for the di ff erence in the two means: (-446.13, -3.87) Again, this is a plausible range of values for the true di ff erence in birth weights comparing the placebo to treated groups What is your null hypothesis? No di ff erence! Given this confidence interval, is “no di ff erence” a plausible value? Almost? 18 / 69
Hypothesis tests for a difference of two means H 0 : µ 1 − µ 2 = µ 0 , H a : µ 1 − µ 2 � = µ 0 Population Sample Population Test Distribution Size Variances Statistic z obs = ( ¯ X 1 − ¯ X 2 ) − µ 0 Large Known r σ 2 σ 2 1 2 n 1 + n 2 Not Large unknown z obs = ( ¯ X 1 − ¯ X 2 ) − µ 0 assume σ 2 1 = σ 2 Normal/ 2 , r s 2 s 2 p p n 1 + Unknown n 2 Large unknown z obs = ( ¯ X 1 − ¯ X 2 ) − µ 0 assume σ 2 1 � = σ 2 2 , r σ 2 σ 2 1 2 n 1 + n 2 small Any Nonparametric Methods 19 / 69
Additional Considerations: We’re not always right Conclusion based on “Truth” Data (sample) H 0 true H 0 false Reject H 0 Type I error Correct Fail to reject H 0 Correct Type II error 20 / 69
Errors in hypothesis testing α α = P (Type I error) = probability of rejecting a true null hypothesis = “level of significance” Aim: to keep Type I error small by specifying a small rejection region α is usually set before performing a test, typically at level α = 0 . 05 21 / 69
Errors in hypothesis testing β I β = P (Type II error) = P (fail to reject H 0 given H 0 is false) Power = 1 − β = probability of rejecting H 0 when H 0 is false Aim: to keep Type II error small and achieve large power 22 / 69
Errors in hypothesis testing β II β depends on sample size, α , and the specified alternative value The value of β is usually unknown since the true mean (or other parameter) is generally unknown Before data collection, scientists should decide the test they will perform the desired Type I error rate α the desired β , for a specified alternative value After specifying this information, an appropriate sample size can be determined 23 / 69
Critical Regions I 24 / 69
Critical Regions II 25 / 69
Critical Regions III 26 / 69
Type II error 27 / 69
Dichotomous variables Proportions 2 × 2 tables Study Design Hypothesis tests 28 / 69
Proportions and 2 × 2 tables Population Success Failure Total Population 1 n 1 − x 1 x 1 n 1 Population 2 n 2 − x 2 x 2 n 2 Total x 1 + x 2 n − ( x 1 + x 2 ) n Row 1 shows results of a binomial experiment with n 1 trials Row 2 shows results of a binomial experiment with n 2 trials 29 / 69
Recommend
More recommend