Topic 9 - ANOVA • Background • ANOVA 1
Comparing several means (some situations) • Does the average number of words per sentence in advertisements differ across magazine types? • Does the expected survival time vary for different types of cancer among patients treated with a specific drug? • Is the mean response time not the same for three different types of circuits? • Is there a difference in average distance carry for baseballs stored at a variety of humidity levels? • Is there a statistical difference between the home run hitting ability of, say Babe Ruth vs. Roger Maris vs. the modern day Mark McGwire or Sammy Sosa? 2
Comparing several means • Suppose that instead of comparing two means we want to test for the equivalence of several means H 0 : 1 = 2 = …= I H A : at least two i ’s are different • Each of the groups we are comparing are called treatments or factors. • We make our decision based on samples from each of the I treatment groups. • Let X i,j represent the j th sample from the i th treatment group with j = 1,…, n i . • We assume each sample comes from a Normal population with common variance . 3
ANOVA – Analysis of Variance • We partition the total variability of the data into treatment (in our control good) and error (out of our control bad) components. n I I i 2 SS ( X X ) , DF n 1 tot i j , tot i i 1 j 1 i 1 I 2 SS n X ( X ) , DF I 1 trt i i trt i 1 n I I i 2 SS ( X X ) , DF n I err i j , i err i i 1 j 1 i 1 SS SS SS , DF DF DF tot trt err tot trt err • What you really want here is for the SSTRT to equal the SSTOT. That means that you have no random error, no SSERR, and 100% of the variation in the model is defined by the treatments. While this would be a perfect result, it is rarely ever the case. 4
ANOVA - Means squares • MS trt = SS trt /DF trt , MS err = SS err /DF err , F = MS trt /Ms err • If H 0 is true (all the means are the same, or really close to being the same), then F should be close to 0. – Your distribution means should be visually close and there should be a lot of “commonality” amongst the distributions….meaning that from a visual standpoint, it would be quite difficult to tell if any specific value of X fell into distribution 1 or 2 or 3 or 4….. • If H 0 is false (at least two of the means are different), then F should be much larger than 1. – Distribution means should be separated and there should be minimal overlap or “commonality” of the distributions….it should be relatively easy to tell if a specific value of X fell into distribution 1 or 2 or 3 or 4….. • The lower the level of overlap in the distributions, the higher the F value and the more persuasive your result. 5
ANOVA – Decision rule • Reject H 0 if F > F DFtrt,DFerr • Demonstration of F calculator. • Note: Since your F test statistic is the ratio of the MStrt to MSerr, the higher that value the better. Larger values of the F test statistic are similar to larger test stats for Z or T, inasmuch as they are more powerful, or able to prove our point with greater significance. 6
Example Calc of SStrt+SSerr=SStot (1) TRT OBS1 OBS2 OBS3 OBS4 OBS5 AVG 1 10 11 11 12 11.00 2 10 13 13 14 14 12.80 3 11 11 11 12 12 11.40 4 14 15 15 15 11 14.00 5 10 10 9 9 10 9.60 11.79 Grand mean is the average of all values in the dataset = 11.79 SStrt is the summation of the squared differences between the treatment means and the grand mean, weighted by the number of observations for each treatment. SStrt = (4(11 ‐ 11.79)^2)+(5(12.8 ‐ 11.79)^2)+(5(11.4 ‐ 11.79)^2) +(5(14 ‐ 11.79)^2)+(5(9.6 ‐ 11.79)^2)=56.7584 SSerr is the summation of the squared differences between the individual observations and their respective treatment means. SSerr=(10 ‐ 11)^2+(11 ‐ 11)^2+(11 ‐ 11)^2+(12 ‐ 11)^2+(10 ‐ 12.8)^2+…+(10 ‐ 9.6)^2=27.2 7 SStot=(10 ‐ 11.79)^2+(11 ‐ 11.79)^2+…+(9 ‐ 11.79)^2+(10 ‐ 11.79)^2=83.9584
Example Calc of SStrt+SSerr=SStot (2) SStrt = (4(11 ‐ 11.79)^2)+(5(12.8 ‐ 11.79)^2)+(5(11.4 ‐ 11.79)^2) +(5(14 ‐ 11.79)^2)+(5(9.6 ‐ 11.79)^2)=56.7584 SSerr=(10 ‐ 11)^2+(11 ‐ 11)^2+(11 ‐ 11)^2+(12 ‐ 11)^2+(10 ‐ 12.8)^2+…+(10 ‐ 9.6)^2=27.2 SStot=(10 ‐ 11.79)^2+(11 ‐ 11.79)^2+…+(9 ‐ 11.79)^2+(10 ‐ 11.79)^2=83.9584 Analysis of Variance results: Data stored in separate columns. Column means Column n Mean Std. Error Trt1 4 11 0.408248 Trt2 5 12.8 0.734847 Trt3 5 11.4 0.244949 Trt4 5 14 0.774597 Trt5 5 9.6 0.244949 ANOVA table Source df SS MS F ‐ Stat P ‐ value Treatments 4 56.75834 14.18958 9.911841 0.0002 Error 19 27.2 1.431579 8 Total 23 83.95834
ANOVA table Source df SS MS F-Stat P-value Treatments 2 5.756057 2.8780284 64.97913 <0.0001 Error 6 0.26574945 0.044291575 Total 8 6.0218062 9
Magazine ads example • 30 magazines were grouped by educational level: – Group 1 – High educational level – Group 2 – Medium educational level – Group 3 – Low educational level • 3 magazines randomly selected from each group: – Group 1: 1. Scientific American, 2. Fortune, 3. The New Yorker – Group 2: 4. Sports Illustrated, 5. Newsweek, 6. People – Group 3: 7. National Enquirer, 8. Grit, 9. True Confessions • 6 ads randomly selected from each of the 9 magazines and the variables below recorded: – WDS - number of words in advertisement copy – SEN - number of sentences in advertising copy – 3SYL - number of 3+ syllable words in advertising copy – MAG - magazine (1 through 9 as above) – GROUP - educational level 10
Magazine Ads in StatCrunch • Is the average number of words per sentence the same across magazine groups? – WDS/SEN – Compare boxplots & QQ plots • What are the null and alternative hypotheses? H : 0 1 2 3 H : at least two groups have a different A average words per sentence • Note: Remember to hold down the CNTL key in StatCrunch when you want to add several ANOVA treatments. 11
Circuit example • Response times in milliseconds were recorded for three different types of circuits used in a shutoff mechanism. Does the data suggest at level 0.05 that all three circuits have the same mean response time? Ho: The mean response times are all the same Ha: At least two of the mean response times are different. 12
Golf Ball Data • I play a lot of golf and I’m always looking for equipment to help me shoot lower scoresThe problem is that I’m cheap….. • One of the main factors in golf is to drive the ball as far as possible (assuming that you don’t create additional dispersion in the process), so if you can find a “longer ball”, it could be beneficial. • The link above shows sample driving distances for three types of balls under consideration (Trispeed, E6 and B330). Test to see if there’s a difference in driving distance….(discuss method here). Ho: The mean driving distance of all balls is the same. Ha: At least two of the balls are decidedly higher or lower than the rest. 13
Multiple comparisons • If we reject H 0 in favor of the alternative H A , then we are only concluding that at least two of the means are different. • If we want to drill down to see which means are actually different, we might be tempted to do two-sample t tests for all mean pairs. • The problem is that the overall level of significance is much higher than the level of significance for each pair wise test. • 3 groups of pairwise comparisons at 5% alpha, gives us 3 comparisons. The resulting overall alpha is 3 1 .95 0.142625 which is way more than we wanted, plus it’s conservative, because 3-pairwise comparisons are not actually independent. • To do these multiple comparisons, we must use Tukey’s method to maintain an overall level of significance. 14
Tukey’s interpretation of Golf Ball Data Shows simultaneous confidence intervals at overall alpha = .05. If “0” is inside a confidence interval, the two listed populations are not different. If it’s not, the two populations are statistically different. Here, both Trispeed and E6 are different than B330, but not from each other. 15 The additional file for Topic 9 contains information and examples on aspects of ANOVA.
Recommend
More recommend