Multiparameter models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2019 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 1 / 20
Outline Multinomial Multivariate normal Unknown mean Unknown mean and covariance In the process, we’ll introduce the following distributions Multinomial Dirichlet Multivariate normal Inverse Wishart (and Wishart) normal-inverse Wishart distribution Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 2 / 20
Multinomial Motivating examples Multivariate count data: Item-response (Likert scale) Voting Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 3 / 20
Multinomial Multinomial distribution Suppose there are K categories and each individual independently chooses category k with probability π k such that � K k =1 π k = 1 . Let Y k ∈ { 0 , 1 , . . . , n } be the number of individuals who choose category k with n = � K k =1 Y k being the total number of individuals. Then Y = ( Y 1 , . . . , Y K ) has a multinomial distribution, i.e. Y ∼ Mult ( n, π ) , with probability mass function (pmf) k π y k � k p ( y ) = n ! y k ! . k =1 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 4 / 20
Multinomial Properties of the multinomial distribution The multinomial distribution with pmf: k π y k � k p ( y ) = n ! y k ! k =1 has the following properties: E [ Y k ] = nπ k V ar [ Y k ] = nπ k (1 − π k ) Cov [ Y k , Y k ′ ] = − nπ k π k ′ for k � = k ′ Marginally, each component of a multinomial distribution is a binomial distribution with Y k ∼ Bin ( n, π k ) . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 5 / 20
Multinomial Dirichlet distribution Let π = ( π 1 , . . . , π K ) have a Dirichlet distribution, i.e. π ∼ Dir ( a ) , with concentration parameter a = ( a 1 , . . . , a K ) where a k > 0 for all k . The probability density function (pdf) for π is K 1 � π a k − 1 p ( π ) = k Beta ( a ) k =1 with � K k =1 π k = 1 and Beta ( a ) is the beta function, i.e. � K k =1 Γ( a k ) Beta ( a ) = . Γ( � K k =1 a k ) Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 6 / 20
Multinomial Properties of the Dirichlet distribution The Dirichlet distribution with pdf K � π a k − 1 p ( π ) ∝ k k =1 has the following properties (where a 0 = � K k =1 a k ): E [ π k ] = a k a 0 V ar [ π k ] = a k ( a 0 − a k ) a 2 0 ( a 0 +1) − a k a k ′ Cov [ π k , π k ′ ] = a 2 0 ( a 0 +1) Marginally, each component of a Dirichlet distribution is a beta distribution with π k ∼ Be ( a k , a 0 − a k ) . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 7 / 20
Multinomial Bayesian inference The conjugate prior for a multinomial distribution, i.e. Y ∼ Mult ( n, π ) , with unknown probability vector π is a Dirichlet distribution. The Jeffreys prior is a Dirichlet distribution with a k = 0 . 5 for all k . Some argue that for large K , this prior will put too much mass on rare categories and would suggest the Dirichlet prior with a k = 1 /K for all k . The posterior under a Dirichlet prior is p ( π | y ) ∝ p ( y | π ) p ( π ) �� K � �� K � k =1 π y k k =1 π a k − 1 ∝ k k = � K k =1 π a k + y k − 1 k Thus π | y ∼ Dir ( a + y ) . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 8 / 20
Multivariate normal Multivariate normal distribution Let Y = ( Y 1 , . . . , Y K ) have a multivariate normal distribution, i.e. Y ∼ N K ( µ, Σ) with mean µ and variance-covariance matrix Σ . The probability density function (pdf) for Y is � − 1 � p ( y ) = (2 π ) − k/ 2 | Σ | − 1 / 2 exp 2( y − µ ) ⊤ Σ − 1 ( y − µ ) Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 9 / 20
Multivariate normal Bivariate normal contours Contours of a bivariate normal with correlation of 0.8 3 8 7 5 6 9 10 3 2 1 1 0 −1 −2 2 10 4 −3 6 7 8 9 5 −3 −2 −1 0 1 2 3 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 10 / 20
Multivariate normal Properties of the multivariate normal distribution The multivariate normal distribution has the following properties: For any subvector Y k of Y where k ⊂ { 1 , 2 , . . . , K } with | k | = d , we have Y k ∼ N d ( µ k , Σ k , k ) where µ k contains the corresponding elements from µ and Σ k , k is the submatrix of Σ constructed by extracting rows k and columns k . Cov [ Y k , Y k ′ ] = Σ k , k ′ is the submatrix of Σ constructed by extracting rows k and columns k ′ . Conditional distributions are also normal, i.e. for k ∩ k ′ = ∅ � � �� � � �� Y k µ k Σ k , k Σ k , k ′ ∼ N , Y k ′ µ k ′ Σ k ′ , k Σ k ′ , k ′ then � � µ k + Σ k , k ′ Σ − 1 k ′ , k ′ ( y k ′ − µ k ′ ) , Σ k , k − Σ k , k ′ Σ − 1 Y k | Y k ′ = y k ′ ∼ N k ′ , k ′ Σ k ′ , k . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 11 / 20
Multivariate normal Representing independence in a multivariate normal Let Y ∼ N ( µ, Σ) with precision matrix Ω = Σ − 1 . If Σ k,k ′ = 0 , then Y k and Y k ′ are independent of each other. If Ω k,k ′ = 0 , then Y k and Y k ′ are conditionally independent of each other given Y j for j � = k, k ′ . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 12 / 20
Multivariate normal Unknown mean Default inference with an unknown mean ind Let Y i ∼ N K ( µ, S ) with default prior p ( µ ) ∝ 1 where Y i = ( Y i 1 , . . . , Y iK ) , then p ( µ | y ) ∝ p ( y | µ ) p ( µ ) � n − 1 i =1 ( y i − µ ) ⊤ S − 1 ( y i − µ ) � � ∝ exp 2 − 1 � 2 tr ( S − 1 S 0 ) � = exp where n � ( y i − µ )( y i − µ ) ⊤ . S 0 = i =1 This posterior is proper if n ≥ 1 (text has a typo) and, in that case, is � � y, 1 µ | y ∼ N K nS . where this y = ( y 1 , . . . , y K ) has elements n y k = 1 � y ik . n i =1 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 13 / 20
Multivariate normal Unknown mean Conjugate inference with an unknown mean ind ∼ N ( µ, S ) with conjugate prior µ ∼ N K ( m, C ) Let Y i p ( µ | y ) ∝ p ( y | µ ) p ( µ ) − 1 � n � i =1 ( y i − µ ) ⊤ S − 1 ( y i − µ ) � ∝ exp 2 − 1 2 µ − m ) ⊤ C − 1 ( µ − m ) � � × exp − 1 � 2 ( µ − m ′ ) ⊤ C ′− 1 ( µ − m ′ ) � = exp and thus µ | y ∼ N ( m ′ , C ′ ) where C − 1 + nS − 1 � − 1 C ′ � = m ′ = C ′ � C − 1 m + nS − 1 y � . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 14 / 20
Multivariate normal Unknown mean Inverse Wishart distribution Let the K × K matrix Σ have an inverse Wishart distribution, i.e. Σ ∼ IW ( v, W − 1 ) , with degrees of freedom v > K − 1 and positive definite scale matrix W . The pdf for Σ is � − 1 W Σ − 1 �� p (Σ) ∝ | Σ | − ( v + K +1) / 2 exp � 2 tr . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 15 / 20
Multivariate normal Unknown mean Properties of the inverse Wishart distribution The inverse Wishart distribution with pdf � − 1 W Σ − 1 �� p (Σ) ∝ | Σ | − ( v + K +1) / 2 exp � 2 tr . has the following properties: E [Σ] = ( v − K − 1) − 1 W for v > K + 1 . Marginally, σ 2 k = Σ kk ∼ Inv − χ 2 ( v, W kk ) . If a K × K matrix Σ − 1 has a Wishart distribution, i.e. Σ − 1 ∼ Wishart ( v, W ) , then Σ ∼ IW ( v, W − 1 ) . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 16 / 20
Multivariate normal Unknown mean Normal-inverse Wishart distribution A multivariate generalization of the normal-scaled-inverse- χ 2 distribution is the normal-inverse Wishart distribution. For a vector µ ∈ R K and K × K matrix Σ , the normal-inverse Wishart distribution is µ | Σ ∼ N ( m, Σ /c ) ∼ IW ( v, W − 1 ) Σ The marginal distribution for µ , i.e. � p ( µ ) = p ( µ | Σ) p (Σ) d Σ , is a multivariate t-distribution, i.e. µ ∼ t v − K +1 ( m, W/ [ c ( v − K + 1)]) . Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 17 / 20
Multivariate normal Unknown mean and covariance Conjugate inference with unknown mean and covariance ind Let Y i ∼ N ( µ, Σ) with conjugate prior Σ ∼ IW ( v, W − 1 ) µ | Σ ∼ N ( m, Σ /c ) which has pdf � − 1 2 tr ( W Σ − 1 ) − c � p ( µ, Σ) ∝ | Σ | − (( v + K ) / 2+1) exp 2( µ − m ) ⊤ Σ − 1 ( µ − m ) . The posterior is a normal-inverse Wishart with parameters c ′ = c + n v ′ = v + n = c c ′ m + n m ′ c ′ y = W + S + cn W ′ c ′ ( y − m )( y − m ) ⊤ where n � ( y i − y )( y i − y ) ⊤ . S = i =1 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 21, 2019 18 / 20
Recommend
More recommend