Wishart Distribution Max Turgeon STAT 7200–Multivariate Statistics
Objectives • Understand the distribution of covariance matrices • Understand the distribution of the MLEs for the multivariate normal distribution • Understand the distribution of functionals of covariance matrices • Visualize covariance matrices and their distribution 2
Before we begin… i • In this section, we will discuss random matrices single vector. • Therefore, we will talk about distributions, derivatives and integrals over sets of matrices 3 • It can be useful to identify the space M n,p ( R ) of n × p matrices with R np . • We can defjne the function vec : M n,p ( R ) → R np that takes a matrix M and maps it to the np -dimensional vector given by concatenating the columns of M into a ( ) 1 3 vec = (1 , 2 , 3 , 4) . 2 4
Before we begin… ii • Another important observation: structural constraints (e.g. symmetry, positive defjniteness) reduce the number of “free” entries in a matrix and therefore the dimension of the subspace. diagonal, and the ofg-diagonal entries above the diagonal (or below). 4 • E.g. If A is a symmetric p × p matrix, there are only 1 2 p ( p + 1) independent entries: the entries on the
Wishart distribution i if we can write 5 • Let S be a random, positive semidefjnite matrix of dimension p × p . • We say S follows a standard Wishart distribution W p ( m ) m ∑ Z i Z T S = i , Z i ∼ N p (0 , I p ) indep. . i =1 • We say S follows a Wishart distribution W p ( m, Σ) with scale matrix Σ if we can write m ∑ Y i Y T S = Y i ∼ N p (0 , Σ) indep. . i , i =1
Wishart distribution ii 6 • We say S follows a non-central Wishart distribution W p ( m, Σ; ∆) with scale matrix Σ and non-centrality parameter ∆ if we can write m m ∑ Y i Y T ∑ µ i µ T S = Y i ∼ N p ( µ i , Σ) indep. , ∆ = i , i . i =1 i =1
7 Example i • Let S ∼ W p ( m ) be Wishart distributed, with scale matrix Σ = I p . ∑ m i =1 Z i Z T • We can therefore write S = i , with Z i ∼ N p (0 , I p ) .
Example ii • Using the properties of the trace, we have 8 ( m ) Z i Z T ∑ tr ( S ) = tr i i =1 m ( ) ∑ Z i Z T = tr i i =1 m ( ) ∑ Z T = tr i Z i i =1 m Z T ∑ = i Z i . i =1 • Recall that Z T i Z i ∼ χ 2 ( p ) .
Example iii B <- 1000 n <- 10; p <- 4 traces <- replicate (B, { Z <- matrix ( rnorm (n * p), ncol = p) W <- crossprod (Z) sum ( diag (W)) }) 9 • Therefore tr ( S ) is the sum of m independent copies of a χ 2 ( p ) , and so we have tr ( S ) ∼ χ 2 ( mp ) .
Example iv hist (traces, 50, freq = FALSE) lines ( density ( rchisq (B, df = n * p))) 10
Example v 11 Histogram of traces 0.05 0.04 0.03 Density 0.02 0.01 0.00 20 30 40 50 60 70 traces
Non-singular Wishart distribution i • As defjned above, there is no guarantee that a Wishart variate is invertible. 12 • To show : if S ∼ W p ( m, Σ) with Σ positive defjnite, S is invertible almost surely whenever m ≥ p . Lemma : Let Z be an n × n random matrix where the entries Z ij are iid N (0 , 1) . Then P (det Z = 0) = 0 . Proof : We will prove this by induction on n . If n = 1 , then the result hold since N (0 , 1) is absolutely continuous. Now let n > 1 and assume the result holds for n − 1 . Write
Non-singular Wishart distribution ii determinant formula, we have 13 Z 11 Z 12 , Z = Z 21 Z 22 where Z 22 is ( n − 1) × ( n − 1) . Note that by assumption, we have det Z 22 � = 0 almost surely. Now, by the Schur ( ) Z 11 − Z 12 Z − 1 det Z = det Z 22 det 22 Z 21 ( ) Z 11 − Z 12 Z − 1 = (det Z 22 ) 22 Z 21 .
Non-singular Wishart distribution iii We now have 14 P ( | Z | = 0) = P ( | Z | = 0 , | Z 22 | � = 0) + P ( | Z | = 0 , | Z 22 | = 0) = P ( | Z | = 0 , | Z 22 | � = 0) = P ( Z 11 = Z 12 Z − 1 22 Z 21 , | Z 22 | � = 0) ( ) P ( Z 11 = Z 12 Z − 1 = E 22 Z 21 , | Z 22 | � = 0 | Z 12 , Z 22 , Z 21 ) = E (0) = 0 ,
Non-singular Wishart distribution iv where we used the laws of total probability (Line 1) and total 15 induction. expectation (Line 4). Therefore, the result follows from We are now ready to prove the main result: let S ∼ W p ( m, Σ) with det Σ � = 0 , and write S = ∑ m i =1 Y i Y T i , with Y i ∼ N p (0 , Σ) . If we let Y be the m × p matrix whose i -th row is Y i . Then m ∑ Y i Y T i = Y T Y . S = i =1
Non-singular Wishart distribution v Now note that decomposition, then we can write Finally, we have 16 rank( S ) = rank( Y T Y ) = rank( Y ) . Furthermore, if we write Σ = LL T using the Cholesky Z = Y ( L − 1 ) T , where the rows Z i of Z are N p (0 , I p ) and rank( Z ) = rank( Y ) .
Non-singular Wishart distribution vi where the last equality follows from our Lemma. Since Defjnition we say it follows a singular Wishart distribution. 17 rank( S ) = rank( Z ) ≥ rank( Z 1 , . . . , Z p ) = p (a.s.) , rank( S ) = p almost surely, it is invertible almost surely. If S ∼ W p ( m, Σ) with Σ positive defjnite and m ≥ p , we say that S follows a nonsingular Wishart distribution. Otherwise,
Additional properties i 18 Let S ∼ W p ( m, Σ) . • We have E ( S ) = m Σ . • If B is a q × p matrix, we have BSB T ∼ W p ( m, B Σ B T ) . • If T ∼ W p ( n, Σ) , then S + T ∼ W p ( m + n, Σ) .
Additional properties ii 19 Now assume we can partition S and Σ as such: S 11 S 12 Σ 11 Σ 12 , , S = Σ = S 21 S 22 Σ 21 Σ 22 with S ii and Σ ii of dimension p i × p i . We then have • S ii ∼ W p i ( m, Σ ii ) • If Σ 12 = 0 , then S 11 and S 22 are independent.
Characteristic function i • The defjnition of characteristic function can be extended to random matrices : is defjned as 20 • Let S be a p × p random matrix. The characteristic function of S evaluated at a p × p symmetric matrix T φ S ( T ) = E (exp( i tr( TS ))) . • We will show that if S ∼ W p ( m, Σ) , then φ S ( T ) = | I p − 2 i Σ T | − m/ 2 . • First, we will use the Cholesky decomposition Σ = LL T .
Characteristic function ii • Next, we can write decomposition: symmetric, and therefore we can compute its spectral 21 m Z j Z T L T , ∑ S = L j j =1 where Z j ∼ N p (0 , I p ) . • Now, fjx a symmetric matrix T . The matrix L T TL is also L T TL = U Λ U T , where Λ = diag( λ 1 , . . . , λ p ) is diagonal and UU T = I p .
Characteristic function iii • We can now write 22
Characteristic function iv 23 m L T ∑ Z j Z T tr( TS ) = tr TL j j =1 m U Λ U T ∑ Z j Z T = tr j j =1 m Λ U T Z j Z T U ∑ = tr j j =1 m ( U T Z j )( U T Z j ) T . ∑ = tr Λ j =1
Characteristic function v • Putting all this together, we get • Two key observations: 24 • U T Z j ∼ N p (0 , I p ) ; ( ) = ∑ p Λ Z j Z T k =1 λ k Z 2 • tr j jk . p m ∑ ∑ λ k Z 2 E (exp( i tr( TS ))) = E exp i jk j =1 k =1 p m ( ( )) ∏ ∏ iλ k Z 2 = E exp . jk j =1 k =1
Characteristic function vi have 25 • But Z 2 jk ∼ χ 2 (1) , and so we have m p ∏ ∏ φ S ( T ) = φ χ 2 (1) ( λ k ) . j =1 k =1 • Recall that φ χ 2 (1) ( t ) = (1 − 2 it ) − 1 / 2 , and therefore we p m ∏ ∏ (1 − 2 iλ k ) − 1 / 2 . φ S ( T ) = j =1 k =1
26 Characteristic function vii k =1 (1 − 2 iλ k ) − 1 / 2 = | I p − 2 i Λ | − 1 / 2 , we then have • Since ∏ p m | I p − 2 i Λ | − 1 / 2 ∏ φ S ( T ) = j =1 = | I p − 2 i Λ | − m/ 2 = | I p − 2 iU Λ U T | − m/ 2 = | I p − 2 iL T TL | − m/ 2 = | I p − 2 i Σ T | − m/ 2
Density of Wishart distribution where same result as before ( Exercise ). expression for the density and check that we obtain the • Proof : Compute the characteristic function using the 27 • Let S ∼ W p ( m, Σ) with Σ positive defjnite and m ≥ p . The density of S is given by 1 − 1 ( ) 2tr(Σ − 1 S ) | S | ( m − p − 1) / 2 , f ( S ) = 2 ) | Σ | m/ 2 exp 2 pm/ 2 Γ p ( m p − 1 u − i u > 1 ( ) Γ p ( u ) = π p ( p − 1) / 4 ∏ Γ , 2( p − 1) . 2 i =0
Sampling distribution of sample covariance • theorem • We will show that using the multivariate Cochran • In the multivariate case, we want to prove: • We are now ready to prove the results we stated a few 28 • Recall again the univariate case: • lectures ago. ( n − 1) s 2 ∼ χ 2 ( n − 1) ; σ 2 ¯ X and s 2 are independent. • ( n − 1) S n ∼ W p ( n − 1 , Σ) ; • ¯ Y and S n are independent.
29 Cochran theorem Let Y 1 , . . . , Y n be a random sample with Y i ∼ N p (0 , Σ) , and write Y for the n × p matrix whose i -th row is Y i . Let A, B be n × n symmetric matrices, and let C be a q × n matrix of rank q . Then 1. Y T A Y ∼ W p ( m, Σ) if and only if A 2 = A and tr A = m . 2. Y T A Y and Y T B Y are independent if and only if AB = 0 . 3. Y T A Y and C Y are independent if and only if CA = 0 .
Application i ones. • Then we have • We need to check the conditions of Cochran’s theorem: 30 • Let C = 1 n 1 T , where 1 is the n -dimensional vector of • Let A = I n − 1 n 11 T . C Y = ¯ Y T A Y = ( n − 1) S n , Y T . • A 2 = A ; • CA = 0 ; • tr A = n − 1 .
Recommend
More recommend