Chapter 11 Categorical Data Analysis
Categorical Data and the Multinomial Distribution Properties of the Multinomial Experiment 1. Experiment has n identical trials 2. There are k possible outcomes to each trial, called classes, categories or cells 3. Probabilities of the k outcomes remain constant from trial to trial 4. Trials are independent Variables of interest are the cell counts, n 1 , n 2 …n k , the 5. number of observations that fall into each of the k classes
Testing Category Probabilities: One-Way Table In a multinomial experiment with categorical data from a single qualitative variable, we summarize data in a one-way table. Schema for one-way table for an experiment with k outcomes … k 1 k 2 k … n 1 n 2 n k
Testing Category Probabilities: One-Way Table Hypothesis Testing for a One-Way Table • Based on the 2 statistic, which allows comparison between the observed distribution of counts and an expected distribution of counts across the k classes • Expected distribution = E(n k )=np k , where n is the total number of trials, and p k is the hypothesized probability of being in class k according to H 0 2 k n E ( n ) • The test statistic, 2 , is calculated as 2 i i E n and the rejection region is determined i 1 i by the 2 distribution using k-1 df and the desired
Testing Category Probabilities: One-Way Table Hypothesis Testing for a One-Way Table • The null hypothesis is often formulated as a no difference, where H 0 : p 1 =p 2 =p 3 =…=p k =1/k , but can be formulated with non-equivalent probabilities • Alternate hypothesis states that H a : at least one of the multinomial probabilities does not equal its hypothesized value
Testing Category Probabilities: One-Way Table Hypothesis Testing for a One-Way Table • The null hypothesis is often formulated as a no difference, where H 0 : p 1 =p 2 =p 3 =…=p k =1/k , but can be formulated with non-equivalent probabilities • Alternate hypothesis states that H a : at least one of the multinomial probabilities does not equal its hypothesized value
Testing Category Probabilities: One-Way Table One-Way Tables: an example H 0 : p none =.10, p Standard =.65, p Merit =.25 H a : At least 2 proportions differ from proposed plan Rejection region with =.01, df = k-1 = 2 is 9.21034 Since the test statistic falls in the rejection =Total x p region, we reject H 0
Testing Category Probabilities: One-Way Table Conditions Required for a valid 2 Test • Multinomial experiment has been conducted • Sample size is large, with E(n i ) at least 5 for every cell
Testing Category Probabilities: Two-Way (Contingency) Table Used when classifying with two qualitative variables General r x c Contingency Table Column … 1 2 c Row Totals … 1 n 11 n 12 n 1c R 1 2 n 21 n 22 n 2c R 2 … … … … … … Row … r n r1 n r2 n rc R r … Column Totals C 1 C 2 C c n H 0 : The two classifications are independent H a : The two classifications are dependent 2 n E R C Test Statistic: ij ij i j 2 w h e re E ij E n ij Rejection region: 2 > 2 , where 2 has (r-1)(c-1) df
Testing Category Probabilities: Two-Way (Contingency) Table Conditions Required for a valid 2 Test • N observed counts are a random sample from the population of interest • Sample size is large, with E(n i ) at least 5 for every cell
Testing Category Probabilities: Two-Way (Contingency) Table Sample Statistical package output
A Word of Caution about Chi-Square Tests • When an expected cell count is less than 5, 2 probability distribution should not be used • If H 0 is not rejected, do not accept H 0 that the classifications are independent, due to the implications of a Type II error. • Do not infer causality when H 0 is rejected. Contingency table analysis determines statistical dependence only.
Recommend
More recommend