stat 5101 lecture slides deck 8 dirichlet distribution
play

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. - PowerPoint PPT Presentation

Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics University of Minnesota This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License ( http://creativecommons.org/


  1. Stat 5101 Lecture Slides: Deck 8 Dirichlet Distribution Charles J. Geyer School of Statistics University of Minnesota This work is licensed under a Creative Commons Attribution- ShareAlike 4.0 International License ( http://creativecommons.org/ licenses/by-sa/4.0/ ). 1

  2. The Dirichlet Distribution The Dirichlet Distribution is to the beta distribution as the multi- nomial distribution is to the binomial distribution. We get it by the same process that we got to the beta distribu- tion (slides 128–137, deck 3), only multivariate. Recall the basic theorem about gamma and beta (same slides referenced above). 2

  3. The Dirichlet Distribution (cont.) Theorem 1. Suppose X and Y are independent gamma random variables X ∼ Gam( α 1 , λ ) Y ∼ Gam( α 2 , λ ) then U = X + Y V = X/ ( X + Y ) are independent random variables and U ∼ Gam( α 1 + α 2 , λ ) V ∼ Beta( α 1 , α 2 ) 3

  4. The Dirichlet Distribution (cont.) Corollary 1. Suppose X 1 , X 2 , . . . , are are independent gamma random variables with the same shape parameters X i ∼ Gam( α i , λ ) then the following random variables X 1 ∼ Beta( α 1 , α 2 ) X 1 + X 2 X 1 + X 2 ∼ Beta( α 1 + α 2 , α 3 ) X 1 + X 2 + X 3 . . . X 1 + · · · + X d − 1 ∼ Beta( α 1 + · · · + α d − 1 , α d ) X 1 + · · · + X d are independent and have the asserted distributions. 4

  5. The Dirichlet Distribution (cont.) From the first assertion of the theorem we know X 1 + · · · + X k − 1 ∼ Gam( α 1 + · · · + α k − 1 , λ ) and is independent of X k . Thus the second assertion of the theorem says X 1 + · · · + X k − 1 ∼ Beta( α 1 + · · · + α k − 1 , α k ) ( ∗ ) X 1 + · · · + X k and ( ∗ ) is independent of X 1 + · · · + X k . That proves the corollary. 5

  6. The Dirichlet Distribution (cont.) Theorem 2. Suppose X 1 , X 2 , . . . , are as in the Corollary. Then the random variables X i Y i = X 1 + . . . + X d satisfy d � Y i = 1 , almost surely . i =1 and the joint density of Y 2 , . . . , Y d is d f ( y 2 , . . . , y d ) = Γ( α 1 + · · · + α d ) y α i − 1 Γ( α 1 ) · · · Γ( α d ) (1 − y 2 − · · · − y d ) α 1 − 1 � i i =2 The Dirichlet distribution with parameter vector ( α 1 , . . . , α d ) is the distribution of the random vector ( Y 1 , . . . , Y d ). 6

  7. The Dirichlet Distribution (cont.) Let the random variables in Corollary 1 be denoted W 2 , . . . , W d so these are independent and W i = X 1 + · · · + X i − 1 ∼ Beta( α 1 + · · · + α i − 1 , α i ) X 1 + · · · + X i Then d � Y i = (1 − W i ) W j , i = 2 , . . . , d, j = i +1 where in the case i = d we use the convention that the product is empty and equal to one. 7

  8. The Dirichlet Distribution (cont.) X i Y i = X 1 + . . . + X d W i = X 1 + · · · + X i − 1 X 1 + · · · + X i The inverse transformation is W i = Y 1 + · · · + Y i − 1 1 − Y i − · · · − Y d = , i = 2 , . . . , d, Y 1 + · · · + Y i 1 − Y i +1 − · · · − Y d where in the case i = d we use the convention that the the sum in the denominator of the fraction on the right is empty and equal to zero, so the denominator itself is equal to one. 8

  9. The Dirichlet Distribution (cont.) 1 − y i − · · · − y d w i = 1 − y i +1 − · · · − y d This transformation has components of the Jacobian matrix 1 ∂w i = − ∂y i 1 − y i +1 − · · · − y d ∂w i = 0 , j < i ∂y j 1 1 − y i − · · · − y d ∂w i = − + (1 − y i +1 − · · · − y d ) 2 , j > i ∂y j 1 − y i +1 − · · · − y d 9

  10. The Dirichlet Distribution (cont.) Since this Jacobian matrix is triangular, the determinant is the product of the diagonal elements d − 1 1 � | det ∇ h ( y 2 , . . . , y d ) | = . 1 − y i +1 − · · · − y d i =2 10

  11. The Dirichlet Distribution (cont.) The joint density of W 2 , . . . , W d is d Γ( α 1 + · · · + α i ) Γ( α 1 + · · · + α i − 1 )Γ( α i ) w α 1 + ··· + α i − 1 − 1 (1 − w i ) α i − 1 � i i =2 d = Γ( α 1 + · · · + α d ) w α 1 + ··· + α i − 1 − 1 (1 − w i ) α i − 1 � i Γ( α 1 ) · · · Γ( α d ) i =2 11

  12. The Dirichlet Distribution (cont.) i =2 w α 1 + ··· + α i − 1 − 1 Γ( α 1 + ··· + α d ) � d (1 − w i ) α i − 1 PMF of W ’s i Γ( α 1 ) ··· Γ( α d ) � d − 1 1 Jacobian i =2 1 − y i +1 −···− y d 1 − y i −···− y d transformation w i = 1 − y i +1 −···− y d The PMF of Y 2 , . . . , Y d is (1 − y i − · · · − y d ) α 1 + ··· + α i − 1 − 1 y α i − 1 d Γ( α 1 + · · · + α d ) i � (1 − y i +1 − · · · − y d ) α 1 + ··· + α i − 1 Γ( α 1 ) · · · Γ( α d ) i =2 d = Γ( α 1 + · · · + α d ) y α i − 1 Γ( α 1 ) · · · Γ( α d ) (1 − y 2 − · · · − y d ) α 1 − 1 � i i =2 12

  13. Univariate Marginals Write I = { 1 , . . . , d } . By definition X i Y i = X 1 + . . . + X d has distribution the beta distribution with parameters α i and � α j j ∈ I j � = i by Theorem 1 because X i ∼ Gam( α i , λ )     � �   X j ∼ Gam α j , λ     j ∈ I j ∈ I   j � = i j � = i 13

  14. Multivariate Marginals Multivariate Marginals are “almost” Dirichlet. As was the case with the multinomial, if we collapse categories, we get a Dirichlet. Let A be a partition of I , and define � Z A = Y i , A ∈ A . i ∈ A � β A = α i , A ∈ A . i ∈ A Then the random vector having components Z A has the Dirichlet distribution with parameters β A . 14

  15. Conditionals X i Y i = X 1 + . . . + X d X i · X 1 + · · · + X k Y i = X 1 + · · · + X k X 1 + · · · + X d X i = · ( Y 1 + · · · + Y k ) X 1 + · · · + X k X i = · (1 − Y k +1 − · · · − Y d ) X 1 + · · · + X k 15

  16. Conditionals (cont.) X i Y i = · (1 − Y k +1 − · · · − Y d ) X 1 + · · · + X k When we condition on Y k +1 , . . . , Y d , the second term above is a constant and the first term a component of another Dirichlet random vector having components X i Z i = , i = 1 , . . . , k X 1 + · · · + X k So conditionals of Dirichlet are constant times Dirichlet. 16

  17. Moments From the marginals being beta, we have α i E ( Y i ) = α 1 + · · · + α d α i � var( Y i ) = α j ( α 1 + · · · + α d ) 2 ( α 1 + · · · + α d + 1) j ∈ I j � = i 17

  18. Moments (cont.) From the PMF we get the “theorem associated with the Dirichlet distribution.”   d � � y α i − 1 (1 − y 2 − · · · − y d ) α 1 − 1  dy 2 · · · dy d � · · ·  i i =2 = Γ( α 1 ) · · · Γ( α d ) Γ( α 1 + · · · + α d ) so E ( Y 1 Y 2 ) = Γ( α 1 + 1)Γ( α 2 + 1)Γ( α 3 ) · · · Γ( α d ) · Γ( α 1 + · · · + α d ) Γ( α 1 + · · · + α d + 2) Γ( α 1 ) · · · Γ( α d ) α 1 α 2 = ( α 1 + · · · + α d + 1)( α 1 + · · · + α d ) 18

  19. Moments (cont.) The result on the preceding slide holds when 1 and 2 are replaced by i and j for i � = j , and cov( Y i , Y j ) = E ( Y i Y j ) − E ( Y i ) E ( Y j ) α i α j α i α j = ( α 1 + · · · + α d + 1)( α 1 + · · · + α d ) − ( α 1 + · · · + α d ) 2 � � α i α j 1 1 = α 1 + · · · + α d + 1 − α 1 + · · · + α d α 1 + · · · + α d α i α j = − ( α 1 + · · · + α d ) 2 ( α 1 + · · · + α d + 1) 19

Recommend


More recommend