biostatistics
play

Biostatistics ANOVA - Analysis of Variance Burkhardt Seifert & - PowerPoint PPT Presentation

Biostatistics ANOVA - Analysis of Variance Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1 Analysis of variance ANOVA = Analysis of variance simple example: Two-sample t


  1. Biostatistics ANOVA - Analysis of Variance Burkhardt Seifert & Alois Tschopp Biostatistics Unit University of Zurich Master of Science in Medical Biology 1

  2. Analysis of variance ANOVA = Analysis of variance simple example: Two-sample t -test = difference between means in two groups (not differences between variances!) analyses and interprets observations of several groups, treatments, conditions, etc. decomposes the total variance present in the data into contributions of the single sources of variation: systematic contributions = differences of means — and random rest = variability around group mean complicated example (Stoll, Br¨ uhlmann, Stucki, Seifert & Michel (1994). J. Rheumatology ): Muscle strength of 7 patients was measured twice by 3 physicians (42 measurements — analysis of variance for repeated measures with 2 within-factors). Is the new measurement reliable? Master of Science in Medical Biology 2

  3. Simple example Example: (Amess et al. 1978) 22 bypass-patients are randomly divided into 3 treatment groups (different respiration). Differ the values of folic acid in red blood cells after 24 h? Group 1 1 1 1 1 1 1 1 Red cell folate 243 251 275 291 347 354 380 392 Group 2 2 2 2 2 2 2 2 2 Red cell folate 206 210 226 249 255 273 285 295 309 Group 3 3 3 3 3 Red cell folate 241 258 270 293 328 Master of Science in Medical Biology 3

  4. Simple example Scientific hypothesis H 1 : The values of folic acid in the red blood cells differ after 24 h, i.e. the 3 population means µ 1 , µ 2 , µ 3 are not all the same. Null hypothesis: H 0 : µ 1 = µ 2 = µ 3 The central result of the analysis of variance is the ANOVA-table: Df Sum Sq Mean Sq F value Pr( > F) (Intercept) 1 1764789.14 1764789.14 844.27 0.0000 group 2 15515.77 7757.88 3.71 0.0436 Residuals 19 39716.10 2090.32 R 2 = 0 . 281, R 2 adj = 0 . 205 Master of Science in Medical Biology 4

  5. Simple example Df Sum Sq Mean Sq F value Pr( > F) (Intercept) 1 1764789.14 1764789.14 844.27 0.0000 group 2 15515.77 7757.88 3.71 0.0436 Residuals 19 39716.10 2090.32 Important: p-value (Pr( > F )) = 0 . 044 Sum of squares (Sum Sq, SS) Mean square (Mean Sq, MS) = SS/“degress of freedom (Df)” Hypothesis H 0 : “Groups have the same true mean” − → under H 0 have MS group (later MS T ) and MS Residuals (later MS res ) the same mean. Test statistic: F = MS T / MS res = 3 . 71 times larger than expected under H 0 . Assumption: Data are normally distributed. p-value p = 0 . 044 from F ∼ F 2 , 19 (see Df) MS res is estimated based on all groups, as in the t -test. Master of Science in Medical Biology 5

  6. Simple example Graphical presentation 350 350 red cell folate red cell folate ● 300 300 ● 250 ● 250 200 1 2 3 1 2 3 group group Error Bars show the mean ± 1 . 0 sd Error Bars show mean ± 1 . 0 sd Dots show mean Dots show mean Master of Science in Medical Biology 6

  7. Simple example Question: Is it possible to provide evidence of the group differences without an analysis of variance? 3 group comparisons! Mean diff. df t-value p-value 1 vs. 2 60.181 15 2.558 0.0218 1 vs. 3 38.625 11 1.327 0.2115 2 vs. 3 -21.556 12 -1.072 0.3046 significant difference between group 1 versus 2. testing of 3 hypotheses Bonferroni correction: p < 0 . 05 / 3 = 0 . 017 significant − → no significance ANOVA provides p-value for the question: “Is there a difference at all?” observations pooled for estimation of variance − → better discriminatory power Master of Science in Medical Biology 7

  8. Two-sample problem is an ANOVA unpaired t -test t df p-value Mean diff. lower upper 2.558 15 0.022 60.18 10.039 110.322 ANOVA Df Sum Sq Mean Sq F value Pr( > F) (Intercept) 1 1378545.94 1378545.94 588.15 0.0000 group 1 15338.96 15338.96 6.54 0.0218 Residuals 15 35158.10 2343.87 R 2 = 0 . 304, R 2 adj = 0 . 257 Note: F = t 2 , p-values are identical. Master of Science in Medical Biology 8

  9. Unpaired t -test as ANOVA Given 2 samples y 11 , y 12 , . . . , y 1 n 1 y 21 , y 22 , . . . , y 2 n 2 with: means µ 1 and µ 2 same variance σ 2 n = n 1 + n 2 observations Model: y ij = µ i + ε ij = µ + α i + ε ij ( i = 1 , 2; j = 1 , . . . , n i ) α i = µ i − µ is called (treatment-) effect Master of Science in Medical Biology 9

  10. Unpaired t -test as ANOVA Decompose total sum of squares SS total : SS total = � n 1 y ) 2 + � n 2 y ) 2 j =1 ( y 1 j − ¯ j =1 ( y 2 j − ¯ = � n 1 y ) 2 + � n 2 y ) 2 j =1 ( y 1 j − ¯ y 1 + ¯ y 1 − ¯ j =1 ( y 2 j − ¯ y 2 + ¯ y 2 − ¯ y ) 2 + n 2 (¯ = ( n 1 − 1) s 2 1 + ( n 2 − 1) s 2 y ) 2 � + n 1 (¯ y 1 − ¯ y 2 − ¯ 2 � �� � �� � (mixed products disappear) = SS res + SS T ( = residual SS + Treatment SS ) = SS within groups + SS between groups Master of Science in Medical Biology 10

  11. Unpaired t -test as ANOVA y 2 ) 2 of the SS T corresponds to squared enumerator (¯ y 1 − ¯ t -statistic y ) 2 + n 2 (¯ y ) 2 SS T = n 1 (¯ y 1 − ¯ y 2 − ¯ � � 2 � � 2 y 1 − n 1 ¯ y 1 + n 2 ¯ y 2 − n 1 ¯ y 1 + n 2 ¯ y 2 y 2 = n 1 ¯ + n 2 ¯ n 1 + n 2 n 1 + n 2 n 1 n 2 y 2 ) 2 = (¯ y 1 − ¯ n 1 + n 2 SS res corresponds to denominator of the t -statistic � ( n 1 − 1) s 2 1 + ( n 2 − 1) s 2 2 s = n 1 + n 2 − 2 Master of Science in Medical Biology 11

  12. Unpaired t -test as ANOVA Definition degrees of freedom (df): (df of SS) = (# squared elements) - (# linear restrictions) df(SS res ) = n 1 − 1 + n 2 − 1 = n − 2 n i � ( Y ij − ¯ 2 restrictions: Y i ) = 0 j =1 df(SS T ) = 2 − 1 = 1 y 1 − ¯ y 2 − ¯ 1 restriction: n 1 (¯ y ) + n 2 (¯ y ) = 0 Degrees of freedom sum up to n − 1 Definition mean squares (MS): MS = SS / df Pooled variance: Mean variability around µ 1 and µ 2 σ 2 = ( n 1 − 1) s 2 1 + ( n 2 − 1) s 2 2 ˆ = MS res ( n 1 − 1) + ( n 2 − 1) Master of Science in Medical Biology 12

  13. Unpaired t -test as ANOVA Null hypothesis H 0 : µ 1 = µ 2 or α 1 = α 2 = 0 F -test � 1 � � � + 1 ( ¯ Y 1 − ¯ σ 2 Y 2 ) ∼ N µ 1 − µ 2 , n 1 n 2 � 1 � � ¯ + 1 � 2 = σ 2 + ( µ 1 − µ 2 ) 2 Y 1 − ¯ − → E Y 2 n 1 n 2 � n 1 n 2 � � ¯ = σ 2 + n 1 n 2 � 2 Y 1 − ¯ ( µ 1 − µ 2 ) 2 − → E [MS T ] = E Y 2 n 1 + n 2 n 1 + n 2 � �� � ≥ 0 E [MS res ] = σ 2 F = MS T / MS res Here: F = t 2 Master of Science in Medical Biology 13

  14. One-way ANOVA Generalisation of the two-sample t -test from 2 to m groups Model: “completely randomized design” y ij = µ i + ε ij = µ + α i + ε ij , i = 1 , . . . , m , j = 1 , . . . , n i ε ij ∼ N (0 , σ 2 ) Decomposition of the observations: = µ + (¯ ˆ y i − ˆ µ ) + ( y ij − ¯ y i ) y ij = µ + ˆ ˆ α i + e ij = “overall mean” + effekt + residual (everything estimated) - well-defined by restrictions; What does “overall mean” stand for? m m µ = 1 � � - meaningful and usual: ˆ y i − ¯ → α i = 0 m i =1 i =1 Scientific hypothesis H 1 : at least one α i � = 0 Null hypothesis H 0 : all α i = 0; “all group means are equal” Master of Science in Medical Biology 14

  15. One-way ANOVA Central: ANOVA-table Df Sum Sq Mean Sq F value Pr( > F) (Intercept) 1 1764789.14 1764789.14 844.27 0.0000 group 2 15515.77 7757.88 3.71 0.0436 Residuals 19 39716.10 2090.32 ANOVA decomposes variance of the observations (“total”) into contributions of the single sources (sources of variation): - group = between groups: variability of the group means (treatments − → SS T ), systematic contribution - Residuals = within groups: variability of the observations within one group (residuals − → SS res ), random contribution Master of Science in Medical Biology 15

  16. One-way ANOVA Degrees of freedom (df) = (number of squared elements) − (number of restrictions) (total n − 1, like for the variance s 2 ) are also decomposed: - between groups: m group means − 1 restriction = m − 1 = 2 n observations − m groups = n − m = 19 - within groups: mean squares: SS / df sum of squares SS T and SS res are independent, under H 0 have MS T and MS res the same mean σ 2 . under H 1 is MS T large, MS res not influenced. − → F = MS T / MS res ∼ F m − 1 , n − m Master of Science in Medical Biology 16

  17. One-way ANOVA In the example: (m=3; n=22) F = 3 . 7 − → p − value p = 0 . 044 1.0 0.8 do not reject H 0 reject H 0 Density of F(2,19) 0.6 0.4 0.2 5% 0.0 0 1 2 3 4 5 6 7 F Test always two-sided. Master of Science in Medical Biology 17

  18. ♣ Confidence intervals In the case of two groups (“ t -test”) we received: � n 1 + 1 1 y 1 − ¯ y 2 − t n − 2 , 1 − α/ 2 s ¯ n 2 � n 1 + 1 1 ≤ µ 1 − µ 2 ≤ ¯ y 1 − ¯ y 2 + t n − 2 , 1 − α/ 2 s n 2 Generalisation σ 2 = s 2 = MS res = 2090 is the pooled residual variance ˆ estimation for all groups y i ) = s res / √ n i − → SE (¯ − → Confidence interval for µ i : y i − s res t n − m , 1 − α/ 2 / √ n i ≤ µ i ≤ ¯ y i + s res t n − m , 1 − α/ 2 / √ n i ¯ Master of Science in Medical Biology 18

Recommend


More recommend