Announcements Announcements U 4: I L 4: ANOVA If I still have your midterm, pick it up at the end of class. Lab 5 Today S 101 Office Hours Tomorrow Nicole Dalzell Project Changes June 3, 2015 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 2 / 40 ANOVA Classy vocabulary ANOVA Classy vocabulary The GSS gives the following 10 question vocabulary test: The GSS also asks the following question: “If you were asked to use one of four names for your social class, which would you say you A SPACE (school, noon, captain, room, board, don’t know) belong in: the lower class, the working class, the middle class, or the B BROADEN (efface, make level, elapse, embroider, widen, don’t know) upper class?” C EMANATE (populate, free, prominent, rival, come, don’t know) D EDIBLE (auspicious, eligible, fit to eat, sagacious, able to speak, don’t know) E ANIMOSITY (hatred, animation, disobedience, diversity, friendship, don’t know) F PACT (puissance, remonstrance, agreement, skillet, pressure, don’t know) (self reported) class G CLOISTERED (miniature, bunched, arched, malady, secluded, don’t know) H CAPRICE (value, a star, grimace, whim, inducement, don’t know) 0.5 I ACCUSTOM (disappoint, customary, encounter, get used to, business, don’t know) J ALLUSION (reference, dream, eulogy, illusion, aria, don’t know) 0.4 0.3 vocabulary scores 0.2 200 0.1 0.0 100 LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS 0 0 2 4 6 8 10 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 3 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 4 / 40
ANOVA Classy vocabulary ANOVA Classy vocabulary Data Exploratory analysis 10 wordsum class 8 1 6 MIDDLE CLASS 6 2 9 WORKING CLASS 4 ● ● 3 6 WORKING CLASS 2 ● ● 0 ● 4 5 WORKING CLASS LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS 5 6 WORKING CLASS 6 6 WORKING CLASS 7 8 MIDDLE CLASS n mean sd 8 10 WORKING CLASS lower class 41 5.07 2.24 9 8 WORKING CLASS working class 407 5.75 1.87 10 9 UPPER CLASS middle class 331 6.76 1.89 · · · upper class 16 6.19 2.34 795 9 MIDDLE CLASS overall 795 6.14 1.98 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 5 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 6 / 40 ANOVA ANOVA and the F test ANOVA ANOVA and the F test Step 2: Hypotheses Participation question Which of the following plots shows groups with means that are most and least likely to be significantly different from each other? Is there a difference between the average vocabulary scores of Amer- I II III icans from different (self reported) classes? 35 ● 25 20 30 20 15 25 15 10 20 10 H 0 : The mean outcome is the same across all categories, 5 15 5 ● 0 10 0 µ LC = µ WC = µ MC = µ UC ● −5 ● −5 ● ● 5 where µ i represents the mean of the outcome for observations in (a) most: I, least: II category i . (b) most: II, least: III (c) most: I, least: III H A : At least one pair of means are different from each other. (d) most: III, least: II (e) most: II, least: I Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 7 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 8 / 40
ANOVA ANOVA and the F test ANOVA ANOVA and the F test More generally... z / t test vs. ANOVA - Purpose ANOVA H 0 : The mean outcome is the same across all categories, z/t test Compare the means from two or Compare means from two groups µ 1 = µ 2 = · · · = µ k , more groups to see whether they to see whether they are so far are so far apart that the observed apart that the observed difference where µ i represents the mean of the outcome for observations in differences cannot all reasonably cannot reasonably be attributed to category i . be attributed to sampling sampling variability. variability. H A : At least one pair of means are different from each other. H 0 : µ 1 = µ 2 H 0 : µ 1 = µ 2 = · · · = µ k Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 9 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 10 / 40 ANOVA ANOVA and the F test ANOVA ANOVA and the F test How do we compare multiple groups? How do we compare multiple groups? Sum of squares total, SST Measures the total variability in the data n � x ) 2 SST = ( x i − ¯ i = 1 where x i represent the value of the response variable of each observation in the dataset. SST = SSG + SSE Sum of squares between groups, SSG Measures the variability between groups , i.e. how the group means com- pare to the grand mean k � x ) 2 SSG = n i (¯ x j − ¯ j = 1 n j : each group size, ¯ x j : average for each group, ¯ x : overall (grand) mean [Explained variability: deviation of group mean from overall mean, weighted by sample size.] Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 11 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 12 / 40
ANOVA ANOVA and the F test ANOVA ANOVA and the F test Building the Test Statistic Introducing the Mean Square SSG SSE This ratio is large when putting the data into their groups seems SST = SSG + SSE to help us explain some of the variability in our data, ie when at If a group mean is very different from another is SSG large or small ? least one group mean is dfiferent enough from the rest that we need to take notice. Since SST , the total sum of squares, is constant, what happens to So, can we use this as our test statistic?? not quite SSE when SSG is large? Why not?? So, when the SSG is large, what happens to the ratio SSG / SSE ? Splitting the data into groups means that the amount of data we are using to estimate each group mean goes down. We have to account for this somehow in the ratio in order for us to be able to use it as a test statistic. Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 13 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 14 / 40 ANOVA ANOVA and the F test ANOVA ANOVA and the F test Mean Square Test statistic F = variability bet. groups variability w/in groups = MSG MSG is mean square between groups MSE MSG = SSG = SSG MSG is mean square between groups df G k − 1 MSG = SSG = SSG where k is number of groups df G k − 1 MSE is mean square error - variability in residuals where k is number of groups MSE = SSE = SSE MSE is mean square error - variability in residuals df E n − k MSE = SSE = SSE where n is number of observations. df E n − k where n is number of observations. Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 15 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 16 / 40
ANOVA ANOVA and the F test ANOVA ANOVA and the F test Step 4: Picture of our Null Universe Step 4: Picture of our Null Universe F = variability bet. groups variability w/in groups F = variability bet. groups variability w/in groups F = MSG / MSE = 78 . 855 / 3 . 628 ≈ 21 . 735 In order to be able to reject H 0 , we need a small p-value, which F = MSG / MSE = 78 . 855 / 3 . 628 ≈ 21 . 735 requires a large F statistic. In order to obtain a large F statistic, variability between sample means needs to be greater than variability within sample means. Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 17 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 18 / 40 ANOVA ANOVA and the F test ANOVA ANOVA and the F test Test statistic Step 5: Compute our P-value Can we see this in the boxplot? F = variability bet. groups variability w/in groups 10 8 6 4 2 ● ● P ( F ≥ 21 . 735 ) < 0 . 0001 ● ● 0 ● Note that you will need access to R to calculate the p-value. You can LOWER CLASS WORKING CLASS MIDDLE CLASS UPPER CLASS use the following function: > pf(F-score, df_group, df_error, lower.tail = FALSE) Does there appear to be a lot of variability within self-reported classes? How about between or across the classes? Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 19 / 40 Statistics 101 (Nicole Dalzell) U4 - L4: ANOVA June 3, 2015 20 / 40
Recommend
More recommend