an alysis o f va riance anova
play

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively - PowerPoint PPT Presentation

An alysis o f va riance (ANOVA) Lecture 4 Objectives By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Select One-way or Two-way ANOVA, or their non-


  1. An alysis o f va riance (ANOVA) Lecture 4

  2. Objectives By actively following the lecture and practical and carrying out the independent study the successful student will be able to: Select One-way or Two-way ANOVA, or their non- ● parametric equivalent tests appropriately and apply them in R Explain the rationale and principles of One-way and ● Two-way ANOVA Interpret and report the results One-way and Two-way ● ANOVA and their non-parametric equivalents

  3. Continuous~categorical tests… >2 samples? How many samples? ANOVA 2 samples 1 sample Data in pairs? (e.g multiple measurements on Yes same organism) One sample t-test H 0 : µ 1 = µ 0 No H 1 : µ 1 ≠ µ 0 Paired sample t-test H 0 : µ 1 - µ 2 = 0 Compares mean Two sample t-test H 1 : µ 1 - µ 2 ≠ 0 to a hypothesized H 0 : µ 1 = µ 2 mean (µ0) H 1 : µ 1 ≠ µ 2 Compares mean difference to zero Compares two means

  4. Continuous ~ Categorical: more than two groups Analysis of Variance tests allow us to consider the differences between more than two groups . They have nonparametric alternatives when assumptions are not met Does genotype affect blood pressure: replicates of 3 different genotypes and measures of systolic blood pressure. genotype Val/Val Met/Val Met/Met … … …

  5. Why ANOVA and not several t -tests? Doing lots of comparisons inflates the type 1 error rate (rejecting the null hypothesis when it is true) Ø For a statistical test with α= 0.05, if the null hypothesis is true then the probability of not obtaining a significant result is 0.95. Ø You compare 4 groups (A, B, C, D) = 6 tests (α= 0.05 for each) The probability of not obtaining a significant result is (0.95) 6 = 0.74 Your chances of incorrectly rejecting the null hypothesis (a type I error) is about 1 in 4 instead of 1 in 20! ANOVA compares all means simultaneously and maintains the type I error probability at the designated level (and not inflating it)

  6. Same principles: t-tests & ANOVA These fundamentally the same way using measures of variation t -tests: is difference big relative to variation? ANOVA: is variation between groups big relative to variation within groups? Also has assumptions based on normal distribution: normality and equal variance

  7. ANOVA terminology The categorical explanatory variable: Factor, Treatment (e.g. genotype) The different groups: Levels of the factor (Val/Val, Met/Val, Met/Met) Variance: MS - Mean square “mean of the squared deviations from the mean” Total variation: Total MS Variation between groups: Treatment MS, Factor MS Variation within the groups: Residual MS, Error MS or

  8. One-way ANOVA: example Which of three media is best for growing bacterial cultures? One factor: media Three levels: Control Control + sugar Control + sugar + amino acids Continuous response: colony diameters (mm)

  9. One-way ANOVA: example Long format Response ~ explanatory Test H 0 : F = 1 vs H 1 : F > 1 Interpretation: H 0 : mean1 = mean2 = mean3 vs H 1 : at least two means differ

  10. One-way ANOVA: example Checking assumptions before running the ANOVA tapply(culture$diameter, culture$medium,shapiro.test) Normality $control Shapiro-Wilk normality test data: X[[1L]] W = 0.9347, p-value = 0.4955 $`with sugar` Shapiro-Wilk normality test data: X[[2L]] W = 0.9429, p-value = 0.5857 No evidence that assumptions are $`with sugar + amino acids` not met Shapiro-Wilk normality test data: X[[3L]] W = 0.9284, p-value = 0.4322

  11. One-way ANOVA: example Checking assumptions before running the ANOVA Equal variance bartlett.test(culture$diameter, culture$medium) Bartlett test of homogeneity of variances data: diameter and medium Bartlett's K-squared = 2.3986, df = 2, p-value = 0.3014 No evidence that assumptions are not met

  12. One-way ANOVA: example Running the test aov() - the anova function Response ~ Explanatory Output saved to mod the ‘model formula’ mod <- aov(diameter ~ medium, data = culture) summary(mod) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  13. One-way ANOVA: example Df Sum Sq Mean Sq F value Pr(>F) Between groups medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 Within groups --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  14. One-way ANOVA: example No. of levels – 1 sum of squared deviations between 3-1=2 group mean and overall mean * number in each group Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 sum of squared deviation between (no. in each level - 1) each value and its group mean x no. of levels (10-1) x 3 = 27

  15. One-way ANOVA: example Mean Square (aka variance) F: Medium MS / = SS / d.f. Residual MS Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

  16. One-way ANOVA: example Reporting the result Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f. = 2, 27; p = 0.006). But not quite finished reporting….. Significance Direction Statistics

  17. Spread should be similar in One-way ANOVA: example each group: equal variance Checking assumptions after running the ANOVA Use the residuals - the ‘real’ assumption plot(mod) Should be approx 1:1 for normality

  18. One-way ANOVA: example Reporting the result But not quite finished reporting yet….. Significance Direction Statistics Which means differ? Requires a “post-hoc” test e.g., Tukey

  19. One-way ANOVA: example A difference of zero Reporting the result: which means differ comparison 95% CI TukeyHSD(aov(diameter ~ medium, data = culture)) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = diameter ~ medium) $medium diff lwr upr p adj with sugar-control 0.170 -0.857331 1.197331 0.9116894 with sugar + amino acids-control 1.331 0.303669 2.358331 0.0092052 with sugar + amino acids-with sugar 1.161 0.133669 2.188331 0.0243794 plot(TukeyHSD(aov(diameter ~ medium, data = culture)))

  20. One-way ANOVA: example Illustrating (order factors going from lowest to highest mean) Figure 1. Mean colony diameter for bacteria grown on different media. Error bars are +/- S.E. Means that do not differ significantly under post- hoc comparison are labelled with the same letter code

  21. Or Anything is possible with ggplot +geom_jitter() : show all data points +annotate() : to add lines and text to the plot

  22. One-way ANOVA: example Reporting the result: finishes There was a significant effect of media on the diameter of bacterial colonies (ANOVA: F = 6.11; d.f . = 2, 27; p = 0.006) with colonies shown, by post-hoc comparison, to grow significantly better when both sugar and amino acids were added to the medium (see Figure 1). The addition of sugar alone did not significantly increase growth. Significance Direction Statistics

  23. One-way ANOVA: nonparametric equivalent When: residuals are heteroscedastic (unequal variance) and/or not normal. Especially when there is a combination of unequal samples sizes and heteroscedasticity. Kruskal-Wallis Uses ranks H 0 : mean rank g1 = mean rank g2 = mean rank g3 etc vs H 1 : at least 2 mean ranks differ

  24. Kruskal-Wallis - example Running the test Here used on same data – for comparison of power. kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742 Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584

  25. Kruskal-Wallis - example Significance Direction Statistics Reporting the result kruskal.test(data = culture, diameter ~ medium) Kruskal-Wallis rank sum test data: diameter by medium Kruskal-Wallis chi-squared = 8.1005, df = 2, p-value = 0.01742 There was a significant effect of media on the diameter of bacterial colonies (Kruskal-Wallis: ! 2 = 8.1; d.f. = 2; p = 0.017). Post-hoc test?

  26. Kruskal-Wallis - example Reporting the result: which groups differ library(pgirmess) kruskalmc(diameter, medium, probs = 0.05) Multiple comparison test after Kruskal-Wallis p.value: 0.05 Comparisons obs.dif critical.dif difference control-with sugar 0.85 9.425108 FALSE control-with sugar + amino acids 10.10 9.425108 TRUE with sugar-with sugar + amino acids 9.25 9.425108 FALSE Difference between the mean of the ranks ranked <- rank(culture$diameter) tapply(ranked, culture$medium,mean) control with sugar with sugar + amino acids 11.85 12.70 21.95

  27. Kruskal-Wallis - example Figure 1. Median (heavy lines) colony diameter for bacteria grown on different media.

  28. Two-way ANOVA What if we have want to see the effects of more than one categorical variable on a continuous variable? Species Sex F.flappa F.concocti I.lepidoptera Male Female

Recommend


More recommend