2/8/2007 219323 Probability and Statistics for Software Statistics for Software and Knowledge Engineers Lecture 11: The Analysis of Variance Th A l i f V i Monchai Sopitkamon, Ph.D. Outline � One-Factor Analysis of Variance (11.1) � Randomized Block Designs (11.2) 1
2/8/2007 One-Factor Analysis of Variance � One-Factor Layouts (11.1.1) � Partitioning the Total Sum of Squares � Partitioning the Total Sum of Squares (11.1.2) � The Analysis of Variance Table (11.1.3) � Pairwise Comparisons of the Factor Level Means (11.1.4) � Sample Size Determination (11.1.5) p ( ) � Model Assumptions (11.1.6) One-Factor Layouts I (11.1.1) � Comparing three or more population means � Objectives: � Objectives: – To determine if the population means are unequal – To determine which population means are different and by how much � Completely randomized design – a set of independent samples from a set of several independent samples from a set of several populations � Uses the analysis of variance (ANOVA) technique to analyze such design 2
2/8/2007 One-Factor Layouts II (11.1.1): Randomization A B Randomizer C D computer programs One-Factor Layouts III (11.1.1): Randomization A B Randomizer C D computer programs 3
2/8/2007 One-Factor Layouts IV (11.1.1) � K populations w/ unknown population means μ 1 , μ 2 ,…, μ k means μ 1 , μ 2 ,…, μ k � If k = 1, � one-sample inference problems (Chapter 8) � If k = 2, � two-sample comparison problems (Chapter 9) � If k ≥ 3, � one-factor ANOVA problems (this chapter ) One-Factor Layouts V (11.1.1) j th observation from the i th population x ij If n 1 = n 2 = … = n k � balanced data set, else unbalanced data set One factor layout 4
2/8/2007 One-Factor Layouts VI (11.1.1) Estim ating the population ( factor level) m eans One-Factor Layouts VII (11.1.1) � Hypothesis testing – Null hypothesis: Null hypothesis: H 0 : μ 1 = … = μ k � population means are all equal – Alternative hypothesis: H A : μ i ≠ μ j for some i and j � at least two of the population means are not equal Acceptance of H 0 means that no evidence that any of the population means are unequal of the population means are unequal Rejection of H 0 means that there is evidence that there is some of the population means are unequal, and so it is not plausible to assume that the population means are all equal 5
2/8/2007 Partitioning the Total Sum of Squares I (11.1.2) Partition of total sum of squares for com pletely random ized one factor layout Partitioning the Total Sum of Squares II (11.1.2) : SST � Total Sum of Squares (SST) – a measure of the total variability in the data set the total variability in the data set 2 ( ) n k ∑∑ i = − SST x x ij .. = = i 1 j 1 where n k ∑∑ i x ij = = overall or grand mean. overall or grand mean = i i 1 1 j j 1 1 x : .. n T j -th observation in group or level i . x : ij number of observations in group or level i : n i k n T : total number of observations: ∑ n i = i 1 6
2/8/2007 Partitioning the Total Sum of Squares III (11.1.2): SST Example • A, B, C, and D are number of processors. • Factor: number of processors. • Levels: A, B, C, and D. Levels: A B C and D • Number in each column: running times (in seconds) of programs under each CPU configuration . Number of Processors Page Replacement Algorithm A B C D 11 12 18 11 13 14 16 12 17 17 17 17 18 18 16 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 18 20 18 Partitioning the Total Sum of Squares IV (11.1.2) : SST Example Page Replacement Algorithm Number of Processors A B C D 11 12 18 11 13 14 16 12 17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 18 20 18 Grand Mean 16.075 SST = (11-16.075) 2 + (13-16.075) 2 + … + (18- 16.075) 2 = 336.75 7
2/8/2007 Partitioning the Total Sum of Squares V (11.1.2) Partition of total sum of squares for com pletely random ized one factor layout Partitioning the Total Sum of Squares VI (11.1.2) : SSTr � Treatment Sum of Squares (SSTr) – a measure of the variability between the factor measure of the variability between the factor levels . k ( ) ∑ = − 2 SSTr n x x i i .. = i 1 where n k ∑∑ ∑∑ i x ij ij = = = overall or grand mean. i 1 j 1 x : .. n T x : sample mean corresponding to group or level i ⋅ i n : number of observations in group or level i i 8
2/8/2007 Partitioning the Total Sum of Squares VII (11.1.2) : SSTr Example Number of Processors Page Replacement Algorithm A B C D 11 12 18 11 13 13 14 14 16 16 12 12 17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 18 20 18 Mean 13.9 17.2 18.3 14.9 Grand Mean 16.075 SSTr = 10 (13.9-16.075) 2 + 10 (17.2-16.075) 2 + 10 (18.3- 16.075) 2 + 10 (14.9- 16.075) 2 = 123.275 Partitioning the Total Sum of Squares VIII (11.1.2) Partition of total sum of squares for com pletely random ized one factor layout 9
2/8/2007 Partitioning the Total Sum of Squares IX (11.1.2) : SSE � Error Sum of Squares (SSE) – a measure of the variability within the factor levels . the variability within the factor levels . 2 ( ) n k ∑∑ i = − SSE x x ⋅ ij i = = i 1 j 1 where x : j -th observation in group or level i ij x : sample mean corresponding to group or level i ⋅ i Partitioning the Total Sum of Squares X (11.1.2) : SSE Example Number of Processors Page Replacement Algorithm A B C D 11 12 18 11 13 13 14 14 16 16 12 12 17 17 18 16 17 19 20 15 15 21 22 14 16 18 15 17 14 19 17 13 10 18 21 16 12 16 16 17 14 14 18 18 20 20 18 18 Mean 13.9 17.2 18.3 14.9 SSE = (11-13.9) 2 + …+(14-13.9) 2 + (12-17.2) 2 + … + (18-17.2) 2 + (18-18.3) 2 + … + (20-18.3) 2 + (11-14.9) 2 + … + (18-14.9) 2 = 213.5 10
2/8/2007 Partitioning the Total Sum of Squares XI (11.1.2) SST = SSTr + SSE = 123.275 + 213.5 = 336.775 Partitioning the Total Sum of Squares XII (11.1.2) : Conclusion I nterpretation of th the sum of f squares for treatm ents and the sum of squares for error 11
2/8/2007 Partitioning the Total Sum of Squares XIII (11.1.2) : Conclusion Dependence of p -value on the sum of squares for treatm ents and the sum of squares for error The Analysis of Variance Table I (11.1.3) Mean Squares for Treatments (MSTr) � SSTr SSTr SSTr SSTr = = MSTr S − degrees of freedom k 1 Mean Square Error (MSE) � SSE SSE = = MSE T − degrees of freedom n k A p -value for the null hypothesis that the factor level means μ I A p value for the null hypothesis that the factor level means μ I � � are all equal is p -value = P ( X ≥ F ) MSTr F = where F -statistic is MSE and RV X has an distribution F − , − k 1 n k T 12
2/8/2007 The Analysis of Variance Table II (11.1.3) P -value calculation for one factor analysis of variance table variance table The Analysis of Variance Table III (11.1.3): ANOVA Example Number of Processors Page Replacement Algorithm A B C D 11 12 18 11 F > F-crit reject Ho. 13 14 16 12 17 17 17 17 18 18 16 16 # procs A, B, C, and 17 19 20 15 15 21 22 14 D have a significant 16 18 15 17 14 19 17 13 difference at 0.05 level 10 18 21 16 12 16 16 17 of significance. 14 18 20 18 Mean 13.9 17.2 18.3 14.9 Grand Mean 16.075 SSA SSA SSTr SST 47 30625 47.30625 12 65625 12.65625 49 50625 49.50625 13.80625 13 80625 123.275 123 275 SSW SSE 213.5 SST 336.775 F=MSTr/MSE α MSA MSTr 41.091667 MSW MSE 5.9305556 F 6.93 k-1 = 4-1 df numer. 3 n T -k = 40-4 df denom. 36 Fu 2.87 (from table) F-crit 13
2/8/2007 The Analysis of Variance Table IV (11.1.3): ANOVA With Excel Since the p-value is less than α = 0.05, reject Ho. The Analysis of Variance Table V (11.1.3) Analysis of variance table for one factor layout � Reject H 0 if F -statistic > F -critical (from Table IV or Excel’s FINV function) Table IV or Excel s FINV function) � or Reject H 0 if p -value < α (as in previous chapter) Excel sheet 14
2/8/2007 Pairwise Comparisons of the Factor Level Means I (11.1.4) � If H 0 is rejected (not all population means are equal), we’d like to be able to tell which are equal), we d like to be able to tell which samples are different and by how much. � Need to do Tukey multiple comparisons to compare all groups simultaneously – By computing the differences μ I – μ j for 1 ≤ i < j ≤ k among all k ( k – 1)/2 pairs of factor level means means � Compute confidence intervals ⎛ ⎞ MSE 1 1 ⎜ ⎟ μ − μ ∈ − ± + x x q ⎜ ⎟ ⋅ ⋅ α − i j i j , k , n k T 2 n n ⎝ ⎠ i j Pairwise Comparisons of the Factor Level Means II (11.1.4) � If the CI for the difference μ I – μ j contains 0, then factor levels i and j are not significantly then factor levels i and j are not significantly different . � If the CI for the difference μ I – μ j does not contains 0, then factor levels i and j are significantly different . � The CI indicates by how much the factor l level means are shown to be different. l h t b diff t 15
Recommend
More recommend