tests for multivariate means
play

Tests for Multivariate Means Max Turgeon STAT 7200Multivariate - PowerPoint PPT Presentation

Tests for Multivariate Means Max Turgeon STAT 7200Multivariate Statistics Objectives Construct tests for a single multivariate mean Discuss and compare confidence regions and confidence intervals Describe connection with


  1. Tests for Multivariate Means Max Turgeon STAT 7200–Multivariate Statistics

  2. Objectives • Construct tests for a single multivariate mean • Discuss and compare confidence regions and confidence intervals • Describe connection with Likelihood Ratio Test • Construct tests for two multivariate means • Present robust alternatives to these tests 2

  3. 3 reject the null hypothesis if • This means that • We saw in a previous lecture that Test for a multivariate mean: Σ known • Let Y 1 , . . . , Y n ∼ N p ( µ, Σ) be independent. µ, 1 � � ¯ Y ∼ N p n Σ . n ( ¯ Y − µ ) T Σ − 1 ( ¯ Y − µ ) ∼ χ 2 ( p ) . • In particular, if we want to test H 0 : µ = µ 0 at level α , then we n ( ¯ Y − µ 0 ) T Σ − 1 ( ¯ Y − µ 0 ) > χ 2 α ( p ) .

  4. Example i library (dslabs) library (tidyverse) dataset <- filter (gapminder, year == 2012, !is.na (infant_mortality)) dataset <- dataset[, c (”infant_mortality”, ”life_expectancy”, ”fertility”)] 4 dataset <- as.matrix (dataset)

  5. Example ii dim (dataset) ## [1] 178 3 # Assume we know Sigma Sigma <- matrix ( c (555, -170, 30, -170, 65, -10, 30, -10, 2), ncol = 3) mu_hat 5 mu_hat <- colMeans (dataset)

  6. Example iii ## infant_mortality life_expectancy fertility ## 25.824157 71.308427 2.868933 # Test mu = mu_0 mu_0 <- c (25, 50, 3) solve (Sigma) %*% (mu_hat - mu_0) c ( drop (test_statistic), qchisq (0.95, df = 3)) ## [1] 7153.275387 7.814728 6 test_statistic <- nrow (dataset) * t (mu_hat - mu_0) %*%

  7. Example iv drop (test_statistic) > qchisq (0.95, df = 3) ## [1] TRUE 7

  8. 8 null hypothesis if Test for a multivariate mean: Σ unknown i • Of course, we rarely (if ever) know Σ , and so we use its MLE n Σ = 1 ˆ ( Y i − ¯ Y )( Y i − ¯ Y ) T � n i =1 or the sample covariance S n . • Therefore, to test H 0 : µ = µ 0 at level α , then we reject the T 2 = n ( ¯ n ( ¯ Y − µ 0 ) T S − 1 Y − µ 0 ) > c, for a suitably chosen constant c that depends on α . • Note : The test statistic T 2 is known as Hotelling’s T 2 .

  9. 9 Test for a multivariate mean: Σ unknown ii • We will show that (under H 0 ) T 2 has a simple distribution: T 2 ∼ ( n − 1) p ( n − p ) F ( p, n − p ) . • In other words, we reject the null hypothesis at level α if T 2 > ( n − 1) p ( n − p ) F α ( p, n − p ) .

  10. Example (revisited) i n <- nrow (dataset); p <- ncol (dataset) # Test mu = mu_0 mu_0 <- c (25, 50, 3) solve ( cov (dataset)) %*% (mu_hat - mu_0) df2 = n - p) / (n - p) 10 test_statistic <- n * t (mu_hat - mu_0) %*% critical_val <- (n - 1) * p *qf (0.95, df1 = p,

  11. Example (revisited) ii c ( drop (test_statistic), critical_val) ## [1] 5121.461370 8.059773 drop (test_statistic) > critical_val ## [1] TRUE 11

  12. We will prove a more general result that we will also be useful for more than one multivariate mean. Theorem independent. Define Then degrees of freedom. 12 Distribution of T 2 Let Y ∼ N p (0 , Σ) , let mW ∼ W p ( m, Σ) , and assume Y , W are T 2 = m Y T W − 1 Y . m − p + 1 T 2 ∼ F ( p, m − p + 1) , mp where F ( α, β ) denotes the non-central F -distribution with α, β

  13. Proof i • In other words, without loss of generality, we can assume • The other columns can be chosen by finding a basis for the obtain an orthonormal basis. 13 • First, if we write Σ = LL T , we can replace Y by L − 1 Y and W with ( L − 1 ) T W ( L − 1 ) without changing T 2 . Σ = I p . • Now, note that since Y and W are independent, the conditional distribution of mW given Y is also W p ( m, I p ) . • Consider Y a fixed quantity, and let H be an orthogonal matrix whose first column is Y ( Y T Y ) − 1 / 2 . orthogonal complement of Y and applying Gram-Schmidt to

  14. Proof ii 14 • Define V = H T WH . Conditional on Y , this is still distributed as 1 m W p ( m, I p ) . • This distribution does not depend on Y , and therefore V and Y are independent. • Decompose V as such:    v 11 V 12  , V 21 V 22 where v 11 is a (random) scalar.

  15. Proof iii • We now have 15 • By result A.2.4g of MKB (see supplementary materials), the (1 , 1) element of V − 1 is given by v − 1 11 | 2 = ( v 11 − V 12 V − 1 22 V 21 ) − 1 . • Moreover, note that v 11 | 2 ∼ χ 2 ( m − p + 1) . 1 mT 2 = Y T W − 1 Y = ( H T Y ) T ( H T WH ) − 1 ( H T Y ) = ( H T Y ) T ( V ) − 1 ( H T Y ) = ( Y T Y ) 1 / 2 v − 1 11 | 2 ( Y T Y ) 1 / 2 = ( Y T Y ) /v 11 | 2 .

  16. Proof iv • Therefore, we have 16 independent chi-squares. • In other words, we have expressed 1 m T 2 as a ratio of m − p + 1 T 2 = � � � � ( Y T Y ) /p / v 11 | 2 / ( m − p + 1) mp ∼ F ( p, m − p + 1) .

  17. • Analogously to the univariate setting, it may be more informative to look at a confidence region : 17 Confidence region for µ i • The set of values µ 0 ∈ R p that are supported by the data, i.e. whose corresponding null hypothesis H 0 : µ = µ 0 would be rejected at level α . • Let c 2 = ( n − 1) p ( n − p ) F α ( p, n − p ) . A 100(1 − α ) % confidence region for µ is given by the ellipsoid around ¯ Y such that n ( ¯ n ( ¯ Y − µ ) T S − 1 Y − µ ) < c 2 , µ ∈ R p .

  18. • We can describe the confidence region in terms of the of unit length. axes 18 Confidence region for µ ii eigendecomposition of S n : let λ 1 ≥ · · · ≥ λ p be its eigenvalues, and let v 1 , . . . , v p be corresponding eigenvectors • The confidence region is the ellipsoid centered around ¯ Y with � ± c λ i v i .

  19. • Therefore, we first need to project onto an axis or onto the plane. 19 Visualizing confidence regions when p > 2 i • When p > 2 we cannot easily plot the confidence regions. • Theorem : Let c > 0 be a constant and A a p × p positive definite matrix. For a given vector u ̸ = 0 , the projection of the ellipse { y T A − 1 y ≤ c 2 } onto u is given by √ u T A u c u . u T u

  20. 20 Visualizing confidence regions when p > 2 ii • If we take u to be the standard unit vectors, we get confidence intervals for each component of µ : � � ( n − 1) p � LB = ¯ � ( n − p ) F α ( p, n − p )( s 2 Y j − jj /n ) � � ( n − 1) p � UB = ¯ � ( n − p ) F α ( p, n − p )( s 2 Y j + jj /n ) .

  21. Example i n <- nrow (dataset); p <- ncol (dataset) df2 = n - p) / (n - p) cbind (mu_hat - sqrt (critical_val * sample_cov / n), mu_hat + sqrt (critical_val * sample_cov / n)) 21 critical_val <- (n - 1) * p *qf (0.95, df1 = p, sample_cov <- diag ( cov (dataset))

  22. Example ii ## [,1] [,2] ## infant_mortality 20.801776 30.846538 ## life_expectancy 69.561973 73.054881 ## fertility 2.565608 3.172257 22

  23. definite matrix. For a given pair of perpendicular unit vectors 23 Visualizing confidence regions when p > 2 (cont’d) i • Theorem : Let c > 0 be a constant and A a p × p positive u 1 , u 2 , the projection of the ellipse { y T A − 1 y ≤ c 2 } onto the plane defined by u 1 , u 2 is given by � ( U T y ) T ( U T AU ) − 1 ( U T y ) ≤ c 2 � , where U = ( u 1 , u 2 ) .

  24. Example (cont’d) i U <- matrix ( c (1, 0, 0, 0, 1, 0), ncol = 2) R <- n *solve ( t (U) %*% cov (dataset) %*% U) 24 transf <- chol (R)

  25. Example (cont’d) ii # First create a circle of radius c theta_vect <- seq (0, 2 * pi, length.out = 100) sin (theta_vect)) # Then turn into ellipse ellipse <- circle %*% t ( solve (transf)) + matrix (mu_hat[1 : 2], ncol = 2, nrow = nrow (circle), byrow = TRUE) 25 circle <- sqrt (critical_val) * cbind ( cos (theta_vect),

  26. Example (cont’d) iii # Eigendecomposition # To visualize the principal axes decomp <- eigen ( t (U) %*% cov (dataset) %*% U) decomp $ vectors[,1] * sqrt (critical_val) decomp $ vectors[,2] * sqrt (critical_val) 26 first <- sqrt (decomp $ values[1]) * second <- sqrt (decomp $ values[2]) *

  27. 27 Example (cont’d) iv 74 73 72 life_expectancy 71 70 69 22 24 26 28 30 infant_mortality

  28. Simultaneous Confidence Statements i following confidence interval: 28 • Let w ∈ R p . We are interested in constructing confidence intervals for w T µ that are simultaneously valid (i.e. right coverage probability) for all w . • Note that w T ¯ Y and w T S n w are both scalars. • If we were only interested in a particular w , we could use the � � w T ¯ � Y ± t α/ 2 ,n − 1 w T S n w/n .

  29. Simultaneous Confidence Statements ii • Or equivalently, the confidence interval contains the set of 29 values w T µ for which t 2 ( w ) = n ( w T ¯ = n ( w T ( ¯ Y − w T µ ) 2 Y − µ )) 2 ≤ F α (1 , n − 1) . w T S n w w T S n w • Strategy : Maximise over all w : n ( w T ( ¯ Y − µ )) 2 t 2 ( w ) = max max . w T S n w w w

  30. Simultaneous Confidence Statements iii • Using the Cauchy-Schwarz Inequality: 30 Y − µ )) 2 = ( w T S 1 / 2 ( w T ( ¯ n S − 1 / 2 ( ¯ Y − µ )) 2 n = (( S 1 / 2 n w ) T ( S − 1 / 2 ( ¯ Y − µ ))) 2 n ≤ ( w T S n w )(( ¯ Y − µ ) T S − 1 n ( ¯ Y − µ )) . • Dividing both sides by w T S n w/n , we get t 2 ( w ) ≤ n ( ¯ Y − µ ) T S − 1 n ( ¯ Y − µ ) .

Recommend


More recommend