Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test Statistical Methods: Lecture 10 Dennis Dobler Vrije Universiteit Amsterdam December 6, 2017 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test Lecture Overview Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test The test of homogeneity and Fisher Exact Test are mentioned in the book in Section 10.3, but no procedure is given. In these slides these procedures are given and will be required for assignments and exam. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Recall frequency distribution: counts of data in different categories. Usually presented in a table form. Idea of goodness-of-fit: we would like to know whether an observed frequency distribution fits some claimed distribution. Exercise 19 (10.2): M&M Candies Mars, Inc. claims that the colours of We collected a random sample of M&M’s are distributed according to n = 100 M&M’s. The observed the following percentages: frequency distribution is as follows: Colour Percentage Colour Frequency Red 13% Red 13 Orange 20% Orange 25 Yellow 14% Yellow 8 Brown 13% Brown 8 Blue 24% Blue 27 Green 16% Green 19 We would like to test whether the How do we decide whether the colour distribution is as claimed with observed frequencies do not match the significance level α = 5%. claimed distribution? Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Exercise 19 (10.2): M&M Candies Recall, claimed distribution is as follows: Colour Red Orange Yellow Brown Blue Green Percentage 13% 20% 14% 13% 24% 16% Since n = 100, we expect 100 · 0 . 13 = 13 red coloured M&M’s in the sample if the colour distribution is as claimed. Similarly for the other colours, so we obtain the following expected frequencies: Colour Red Orange Yellow Brown Blue Green Expected frequency 13 20 14 13 24 16 NB: in general, n � = 100 so this calculation is then slightly more complicated. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Expected frequencies Suppose there are k different categories and a random sample of size n is conducted. Let p 1 be the claimed probability that a subject falls in category 1. Similarly for probabilities p 2 , . . . , p k . H 0 : frequency counts agree with the claimed distribution; H a : frequency counts do not agree with the claimed distribution. The expected frequency E i is the expected number of occurences of category i in the sample under the assumption that H 0 is true, it is computed by E i = n · p i . Note: E i do not have to be integers. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Exercise 19 (10.2): M&M Candies If the distribution is as claimed by Mars, Inc. we would expect the following frequencies: Colour Red Orange Yellow Brown Blue Green Expected frequency 13 20 14 13 24 16 And recall that we observed the following frequencies: Colour Red Orange Yellow Brown Blue Green Observed frequency 13 25 8 8 27 19 Idea: take as test statistic the sum of squared and scaled differences � (Observed frequency − Expected frequency) 2 Expected frequency If certain requirements are met, this test statistic has under H 0 approximately a known theoretical distribution. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Goodness-of-fit test Suppose there are k different categories and a random sample of size n is conducted. H 0 : frequency counts agree with the claimed distribution p 1 = � value � , . . . H a : frequency counts do not agree with the claimed distribution. Let O i be the observed frequency count of category i . Expected frequency E i is computed by E i = n · p i . Requirements: all E i ≥ 5. If the requirements are met, then the test statistic k ( O i − E i ) 2 X 2 = � E i i =1 has approximately a chi-square distribution with k − 1 degrees of freedom under H 0 . H 0 is rejected for large values of the observed value χ 2 : ◮ Critical value method: reject H 0 if χ 2 > χ 2 k − 1 ,α , where χ 2 k − 1 ,α can be found in Table 4 of Appendix. ◮ P -value method: if P ( X 2 ≥ χ 2 ) < α reject H 0 . Use R for this. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Intermezzo: Chi-square distribution A random variable having a chi-square distribution with k degrees of freedom is a continuous random variable, whose distribution is asymmetric. It only takes positive values and is right-skewed. Furthermore, different degrees of freedom yield different distributions. 0.25 df=3 df=5 df=10 df=20 0.20 0.15 density 0.10 0.05 0.00 0 10 20 30 40 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Exercise 19 (10.2): M&M Candies Recall that we expected and observed the following frequencies: Colour Red Orange Yellow Brown Blue Green Observed frequency 13 25 8 8 27 19 Expected frequency 13 20 14 13 24 16 Since all E i are larger than 5, the requirements are met, so the test statistic ( O i − E i ) 2 X 2 = � k has under H 0 approximately a chi-square distribution with i =1 E i k − 1 = 5 degrees of freedom. The observed value of the test statistic is 6 ( o i − E i ) 2 = (13 − 13) 2 + (25 − 20) 2 + (8 − 14) 2 + (8 − 13) 2 χ 2 = � + 13 20 14 13 E i i =1 (27 − 24) 2 + (19 − 16) 2 ≈ 6 . 68 24 16 5 , 0 . 05 = 11 . 07. Since χ 2 = 6 . 68 < 11 . 07 = χ 2 Critical value is χ 2 k − 1 ,α = χ 2 5 , 0 . 05 we fail to reject H 0 . There is not sufficient evidence to reject the claim by Mars, Inc. about the colour distribution. Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.2 Goodness-of-Fit Degrees of freedom and critical region Note that the degrees of freedom of the goodness-of-fit test is determined by the number of categories, not by the sample size! Furthermore, note that the alternative hypothesis of a goodness-of-fit test is undirected: H a : frequency counts do not agree with the claimed distribution. But H 0 is only rejected for large values of the observed value χ 2 of the test statistic. So alternative hypothesis is undirected, but the test is right-tailed! Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test 10.3 Contingency Tables Interested in inference about two categorical variables, in particular whether there is a relationship between the two variables. For instance, are gender and left-handedness independent? Suppose these two variables are measured in a sample. How to present results? Contingency table A contingency table (or two-way table) is a table consisting of frequency counts of categorical data corresponding to two different variables. Row variable: r categories. Column variable: c categories. So table consists of r × c cells/entries. O ij : cell ( i , j ) of given contingency table, which corresponds with number of subjects in i -th category of row variable and j -th category of column variable. Exercise 12 (10.3): Lefties Left-handed Right-handed Total Male 23 217 240 Female 65 455 520 Total 88 672 760 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Goodness-of-fit Test of independence Test of homogeneity Fisher’s Exact Test Exercise 12 (10.3): Lefties We would like to test the claim that gender and left-handedness are independent: H 0 : gender and left-handedness are indepedent; H a : gender and left-handedness are depedent. We take α = 5%. Conducted a sample of size 760: Left-handed Right-handed Total Male 23 217 240 Female 65 455 520 Total 88 672 760 If H 0 is true, which contingency table would we expect? Left-handed Right-handed Total Male ? ? 240 Female ? ? 520 Total 88 672 760 Dennis Dobler Vrije Universiteit Amsterdam Statistical Methods: Lecture 10
Recommend
More recommend