multivariate analysis of variance
play

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Multivariate Analysis of Variance Max Turgeon STAT 7200Multivariate Statistics Objectives Present the four classical test statistics Discuss approximations for their null distribution 2 Introduce MANOVA as a generalization of


  1. Multivariate Analysis of Variance Max Turgeon STAT 7200–Multivariate Statistics

  2. Objectives • Present the four classical test statistics • Discuss approximations for their null distribution 2 • Introduce MANOVA as a generalization of Hotelling’s T 2

  3. Quick Overview What do we mean by Analysis of Variance? • ANOVA is a collection of statistical models that aim to analyze and understand the differences in means between different subgroups of the data. • Note that there could be multiple, overlapping ways of defining the subgroups (e.g multiway ANOVA) • It also provides a framework for hypothesis testing. • Which can be recovered from a suitable regression model. • Most importantly , ANOVA provides a framework for understanding and comparing the various sources of variation in the data. 3 • As such, it can be seen as a generalisation of the t -test (or of Hotelling’s T 2 ).

  4. Review of univariate ANOVA i . • Homoscedasticity . . ... . . . 4 • Assume the data comes from g populations: X 11 , . . . , X 1 n 1 X g 1 , . . . , X gn g • Assume that X ℓ 1 , . . . , X ℓn ℓ is a random sample from N ( µ ℓ , σ 2 ) , for ℓ = 1 , . . . , g . • We are interested in testing the hypothesis that µ 1 = . . . = µ g .

  5. Review of univariate ANOVA ii • We can write our observations as there are infinitely many models that lead to the same data-generating mechanism. 5 • Reparametrisation : We will write the mean µ ℓ = µ + τ ℓ as a sum of an overall component µ (i.e. shared by all populations) and a population-specific component τ ℓ . • Our hypothesis can now be rewritten as τ ℓ = 0 , for all ℓ . X ℓi = µ + τ ℓ + ε ℓi , where ε ℓi ∼ N (0 , σ 2 ) . • Identifiability : We need to assume ∑ g ℓ =1 τ ℓ = 0 , otherwise • Sample statistics : Set n = ∑ g ℓ =1 n ℓ .

  6. Review of univariate ANOVA iii • We get the following decomposition: get 6 • Overall sample mean: ¯ ∑ g ∑ n ℓ X = 1 i =1 X ℓi . n ℓ =1 • Population-specific sample mean: ¯ ∑ n ℓ 1 X ℓ = i =1 X ℓi . n ℓ ( ¯ ( X ℓi − ¯ ) X ℓ − ¯ ) ( X ℓi − ¯ ) X = X + X ℓ . • Squaring the left-hand side and summing over both ℓ and i , we ( ¯ g n ℓ g g n ℓ ) 2 = ) 2 . ) 2 + ( X ℓi − ¯ X ℓ − ¯ ( X ℓi − ¯ ∑ ∑ ∑ ∑ ∑ X n ℓ X X ℓ i =1 i =1 ℓ =1 ℓ =1 ℓ =1 • This is typically summarised as SS T = SS M + SS R : ) 2 • The total sum of squares : SS T = ∑ g ∑ n ℓ ( X ℓi − ¯ X i =1 ℓ =1

  7. Review of univariate ANOVA iv • The model (or treatment) sum of squares : Total Residual Model Degrees of freedom Sum of Squares Source of Variation • Yet another representation is the ANOVA table : 7 • The residual sum of squares : ( ¯ ) 2 SS M = ∑ g X ℓ − ¯ ℓ =1 n ℓ X ) 2 SS R = ∑ g ( ∑ n ℓ X ℓi − ¯ X ℓ ℓ =1 i =1 SS M g − 1 SS R n − g SS T n − 1

  8. Review of univariate ANOVA v • We could also instead reject the null hypothesis for small values of This is the test statistic that we will generalize to the multivariate setting. 8 • The usual test statistic used for testing τ ℓ = 0 for all ℓ is F = SS M / ( g − 1) SS R / ( n − g ) ∼ F ( g − 1 , n − g ) . SS R = SS R . SS R + SS M SS T

  9. Multivariate ANOVA i . • We are again interested in testing the hypothesis that • Homoscedasticity is key here again. . . ... . . . populations: 9 • The setting is similar: Assume the data comes from g Y 11 , . . . , Y 1 n 1 Y g 1 , . . . , Y gn g • Assume that Y ℓ 1 , . . . , Y ℓn ℓ is a random sample from N p ( µ ℓ , Σ) , for ℓ = 1 , . . . , g . µ 1 = . . . = µ g . • Reparametrisation : We will write the mean as µ ℓ = µ + τ ℓ

  10. Multivariate ANOVA ii • Instead of a decomposition of the sum of squares, we get a • The decomposition is given as decomposition of the outer product: 10 • Identifiability : We need to assume • Y ℓi = µ + τ ℓ + E ℓi , where E ℓi ∼ N p (0 , Σ) . ∑ g ℓ =1 τ ℓ = 0 . ( Y ℓi − ¯ Y )( Y ℓi − ¯ Y ) T . g n ℓ g Y ) T = ( Y ℓi − ¯ Y )( Y ℓi − ¯ n ℓ ( ¯ Y ℓ − ¯ Y )( ¯ Y ℓ − ¯ Y ) T ∑ ∑ ∑ i =1 ℓ =1 ℓ =1 g n ℓ ∑ ∑ ( Y ℓi − ¯ Y ℓ )( Y ℓi − ¯ Y ℓ ) T . + ℓ =1 i =1

  11. Multivariate ANOVA iii • Within sum of squares and cross products matrix : are independent, and that under the null hypothesis that • Between sum of squares and cross products matrix : 11 B = ∑ g ℓ =1 n ℓ ( ¯ Y ℓ − ¯ Y )( ¯ Y ℓ − ¯ Y ) T . W = ∑ g ∑ n ℓ i =1 ( Y ℓi − ¯ Y ℓ )( Y ℓi − ¯ Y ℓ ) T . ℓ =1 • Note that W = ∑ g ℓ =1 ( n ℓ − 1) S ℓ , and therefore W p ( n − g, Σ) . • Moreover, using Cochran’s theorem, we can show that W and B τ ℓ = 0 for all ℓ = 1 , . . . , g , we also have B ∼ W p ( g − 1 , Σ) .

  12. Multivariate ANOVA iv • Similarly as above, we have a MANOVA table : Source of Variation Sum of Squares Degrees of freedom Model Residual Total 12 B g − 1 W n − g B + W n − 1

  13. Likelihood Ratio Test i • As the notation suggests, this is the likelihood ratio test statistic . pooled covariance: each mean parameter is maximised independently, and the • Under the unrestricted model (i.e. no constraint on the means), maximum likelihood estimator for the covariance matrix is the will use Wilk’s lambda as our test statistic: 13 • To test the null hypothesis H 0 : τ ℓ = 0 for all ℓ = 1 , . . . , g , we | W | Λ 2 /n = | B + W | . Σ = 1 ˆ µ ℓ = ¯ ˆ Y ℓ , nW.

  14. Likelihood Ratio Test ii • Putting this together, we get • Under the null model (i.e. all means are equal), all observations 14 maximum likelihood estimators are Y ℓi come from a unique distribution N p ( µ, Σ) , and so the Σ = 1 ˆ µ = ¯ ˆ Y , n ( B + W ) . Λ = (2 π ) − np/ 2 exp( − np/ 2) | 1 n ( B + W ) | − n/ 2 (2 π ) − np/ 2 exp( − np/ 2) | 1 n W | − n/ 2 = | 1 n ( B + W ) | − n/ 2 | 1 n W | − n/ 2 ) n/ 2 ( | W | = . | B + W |

  15. Likelihood Ratio Test iii • From the general asymptotic theory, we now that we reject the null hypothesis if 15 • Using Bartlett’s approximation, we can get an unbiased test: − 2 log Λ ≈ χ 2 (( g − 1) p ) . n − 1 − 1 ( ) log Λ ≈ χ 2 (( g − 1) p ) . − 2( p + g ) • In particular, if we let c = χ 2 α (( n − 1) p ) be the critical value, ( ) − c Λ ≤ exp . n − 1 − 0 . 5( p + g )

  16. Example i ## Example on producing plastic film ## from Krzanowski (1998, p. 381) tear <- c (6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3, 6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6) 9.9, 9.5, 9.4, 9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2) opacity <- c (4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7, 2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9) 16 gloss <- c (9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0,

  17. Example ii Y <- cbind (tear, gloss, opacity) Y_low <- Y[1 : 10,] Y_high <- Y[11 : 20,] W <- ( nrow (Y_low) - 1) *cov (Y_low) + ( nrow (Y_high) - 1) *cov (Y_high) B <- (n-1) *cov (Y) - W (Lambda <- det (W) /det (W + B)) ## [1] 0.4136192 17 n <- nrow (Y); p <- ncol (Y); g <- 2

  18. Example iii transf_lambda <- - (n - 1 - 0.5 * (p + g)) *log (Lambda) ## [1] TRUE # Or if you want a p-value pchisq (transf_lambda, p * (g-1), lower.tail = FALSE) ## [1] 0.002227356 18 transf_lambda > qchisq (0.95, p * (g-1))

  19. Example iv # R has a function for MANOVA # But first, create factor variable rate <- gl (g, 10, labels = c (”Low”, ”High”)) fit <- manova (Y ~ rate) summary_tbl <- broom ::tidy (fit, test = ”Wilks”) # Or you can use the summary function knitr ::kable (summary_tbl, digits = 3) 19

  20. Example v 3 - - - - - 18 Residuals 0.002 16 7.561 term 0.414 1 rate p.value den.df num.df statistic wilks df 20

  21. Example vi # Check residuals for evidence of normality library (tidyverse) variable, residual) ggplot (data_plot, aes (sample = residual)) + stat_qq () + stat_qq_line () + facet_grid (. ~ variable) + theme_minimal () 21 resids <- residuals (fit) data_plot <- gather ( as.data.frame (resids),

  22. Example vii 22 gloss opacity tear 2.5 sample 0.0 −2.5 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2 theoretical

  23. Example viii # Next: Chi-squared plot Sn <- cov (resids) qqplot ( qchisq ( ppoints (dists), df = df), qqline (dists, distribution = function (p) { qchisq (p, df = df) }) 23 dists <- mahalanobis (resids, colMeans (resids), Sn) df <- mean (dists) dists, xlab = ”Theoretical Quantiles”)

  24. Example ix 24 7 6 5 4 dists 3 2 1 0 0 2 4 6 8 Theoretical Quantiles

  25. Comments i • The output from R shows a different approximation to the Wilk’s lambda distribution, due to Rao. • There are actually 4 tests available in R : • Wilk’s lambda; • Pillai-Bartlett; • Hotelling-Lawley; • Roy’s Largest Root. 25

  26. Comments ii • Since we only had two groups in the above example, we were only comparing two means. • But of course MANOVA is much more general. • We can assess the normality assumption by looking at the 26 • Wilk’s lambda was therefore equivalent to Hotelling’s T 2 . residuals E ℓi = Y ℓi − ¯ Y ℓ .

Recommend


More recommend