Introduction to Bayesian Statistics Lecture 7: Multiparameter models (III) Rung-Ching Tsai Department of Mathematics National Taiwan Normal University April 15, 2015
Multiparameter model: the multinomial model • y = ( y 1 , · · · , y J ) ∼ multinomial( n ; θ 1 , · · · , θ J ) with � J j =1 y j = n , use Bayesian approach to estimate θ = ( θ 1 , · · · , θ J ). i.e., ◦ Likelihood: J � y j p ( y | θ ) ∝ θ j j =1 ◦ Prior of θ : choose the conjugate prior of a Dirichlet distribution, Dirichlet( α 1 , · · · , α J ), for θ : J J α j − 1 � � p ( θ | α ) ∝ θ with θ j = 1 . j j =1 j =1 where Dirichlet is a multivariate generalization of the beta distribution. ◦ Posterior of θ J α j + y j − 1 � p ( θ | y ) = p ( θ ) p ( y | θ ) ∝ θ , i . e ., θ | y ∼ Dirichlet ( α 1 + y 1 , · · · , α J + y J ) j j =1 2 of 13
Multiparameter model: the multivariate normal model iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), Σ known, use Bayesian approach to estimate µ . ◦ choose a conjugate prior for µ , µ ∼ MVN( µ 0 , Λ 0 ) � − 1 � p ( µ ) ∝ | Λ 0 | − 1 / 2 exp 2( µ − µ 0 ) T Λ − 1 0 ( µ − µ 0 ) ◦ likelihood of µ : � n � − 1 � | Σ | − n / 2 exp ( y i − µ ) T Σ − 1 ( y i − µ ) p ( y 1 , · · · , y n | µ , Σ ) ∝ 2 i =1 � − 1 � | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) = where S 0 = � n i =1 ( y i − µ )( y i − µ ) T 3 of 13
Multiparameter model: multivariate normal, Σ known iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), Σ known, use Bayesian approach to estimate µ . ◦ find the posterior distribution of µ : p ( µ | y 1 , · · · , y n , Σ ) ∝ p ( µ ) p ( y 1 , · · · , y n | µ ) | Σ | − n / 2 exp( − 1 2[( µ − µ 0 ) T Λ − 1 ∝ 0 ( µ − µ 0 ) n � ( y i − µ ) T Σ − 1 ( y i − µ )]) + i =1 � − 1 � 2( µ − µ n ) T Λ − 1 ∝ exp n ( µ − µ n ) that is, p ( µ | y 1 , · · · , y n , Σ ) ∼ MVN( µ n , Λ n ), where µ n = ( Λ − 1 + n Σ − 1 ) − 1 ( Λ − 1 0 µ 0 + n Σ − 1 ¯ y ) and Λ − 1 = Λ − 1 + n Σ − 1 0 n 0 4 of 13
Multiparameter model: multivariate normal, Σ known • p ( µ | y 1 , · · · , y n , Σ ) ∼ MVN( µ n , Λ n ), where µ n = ( Λ − 1 + n Σ − 1 ) − 1 ( Λ − 1 0 µ 0 + n Σ − 1 ¯ y ) and Λ − 1 = Λ − 1 + n Σ − 1 0 n 0 � µ (1) � � � � µ (1) Λ (11) Λ (12) � n n n • Let µ = , µ n = and Λ n = . µ (2) µ (2) Λ (21) Λ (22) n n n ◦ posterior marginal distribution of subvectors of µ : n , Λ (11) p ( µ (1) | y 1 , · · · , y n , Σ ) ∼ MVN( µ (1) ) n ◦ posterior conditional distribution of subvectors of µ : + β 1 | 2 ( µ (2) − µ (2) p ( µ (1) | µ (2) , y 1 , · · · , y n , Σ ) ∼ MVN( µ (1) n ) , Λ 1 | 2 ) n where β 1 | 2 = Λ (12) ) − 1 , and Λ 1 | 2 = Λ (11) ( Λ (22) − Λ (12) ( Λ (22) ) − 1 Λ (21) . n n n n n n 5 of 13
Multiparameter model: multivariate normal, Σ known • p ( µ | y 1 , · · · , y n , Σ ) ∼ MVN( µ n , Λ n ), where µ n = ( Λ − 1 + n Σ − 1 ) − 1 ( Λ − 1 0 µ 0 + n Σ − 1 ¯ y ) and Λ − 1 = Λ − 1 + n Σ − 1 0 n 0 • Let ˜ y ∼ MVN( µ , Σ ), new observation. ◦ posterior predictive distribution of ˜ y , Σ known p (˜ y , µ | y 1 , · · · , y n ) = N(˜ y | µ , Σ )N( µ | µ n , Λ n ) is the exponential of a quadratic form in (˜ y , µ ), hence ˜ y ∼ N( µ n , Σ + Λ n ) where E(˜ y | y ) = E(E(˜ y | µ , y ) | y ) = E( µ | y ) = µ n var(˜ E(Var(˜ y | µ , y ) | y ) + var(E(˜ y | y ) = y | µ , y ) | y )) = E( Σ | y ) + var( µ | y ) = Σ + Λ n 6 of 13
Multiparameter model: multivariate normal, Σ known iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), Σ known, use Bayesian approach to estimate µ . ◦ prior for µ : choose a non-informative prior, p ( µ ) ∼ 1 ◦ likelihood of µ : � n � − 1 � ( y i − µ ) T Σ − 1 ( y i − µ ) | Σ | − n / 2 exp p ( y 1 , · · · , y n | µ , Σ ) ∝ 2 i =1 � − 1 � | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) = where S 0 = � n i =1 ( y i − µ )( y i − µ ) T ◦ posterior for µ : p ( µ | y 1 , · · · , y n , Σ ) ∝ p ( µ ) p ( y 1 , · · · , y n | µ , Σ ) ∝ p ( y 1 , · · · , y n | µ , Σ ) , i.e., y , Σ µ | Σ , y 1 , · · · , y n ∼ MVN(¯ n ) . 7 of 13
Multivariate normal model, Σ unknown iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), both µ and Σ known, use Bayesian approach to estimate µ . ◦ take a conjugate prior for ( µ , Σ ): p ( µ , Σ ) = p ( Σ ) p ( µ | Σ ) Inv-Wishart ν 0 ( Λ − 1 Σ ∼ 0 ) µ | Σ ∼ MVN( µ 0 , Σ /κ 0 ) i.e., the joint prior density p ( µ , Σ ) � − 1 � 2tr( Λ 0 Σ − 1 ) − κ 0 p ( µ , Σ ) ∝ | Σ | − (( ν 0 + d ) / 2+1) exp 2 ( µ − µ 0 ) T Σ − 1 ( µ − µ 0 ) . We label this the N-Inverse-Wishart( µ 0 , Λ 0 /κ 0 ; ν 0 , Λ 0 ) ◦ likelihood: � � − 1 p ( y 1 , · · · , y n | µ , Σ ) ∝ | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) where S 0 = � n i =1 ( y i − µ )( y i − µ ) T 8 of 13
Joint posterior distribution, p ( µ , Σ | y 1 , · · · , y n ) iid • y 1 , · · · , y n ∼ MVN( µ , Σ ) ◦ prior of ( µ , Σ ): µ , Σ ∼ N-Inverse-Wishart( µ 0 , Λ 0 /κ 0 ; ν 0 , Λ 0 ) ◦ the joint posterior distribution of ( µ , Σ ): p ( µ , Σ | y 1 , · · · , y n ) ∝ p ( µ , Σ ) p ( y 1 , · · · , y n | µ , Σ ) � − 1 2tr( Λ 0 Σ − 1 ) − κ 0 � | Σ | − ( ν 0+ d ) +1 exp 2 ( µ − µ 0 ) T Σ − 1 ( µ − µ 0 ) ∝ 2 � − 1 � | Σ | − n / 2 exp 2tr( Σ − 1 S 0 ) × = N-Inv-Wishart( µ n , Λ n /κ n ; ν n , Λ n ) . (1) where κ 0 n • µ n = κ 0 + n µ 0 + κ 0 + n ¯ y • κ n = κ 0 + n • ν n = ν 0 + n y − µ 0 ) T with S = � n κ 0 n y ) T • Λ n = Λ 0 + S + κ 0 + n (¯ y − µ 0 )(¯ i =1 ( y i − ¯ y )( y i − ¯ 9 of 13
Conditional posterior distribution, p ( µ | Σ , y 1 , · · · , y n ) • p ( µ , Σ | y 1 , · · · , y n ) = p ( µ | Σ , y 1 , · · · , y n ) p ( Σ | y 1 , · · · , y n ) • the conditional posterior density of µ given Σ is proportional to the joint posterior density (1) with Σ held constant, µ | Σ , y 1 , · · · , y n ∼ MVN( µ n , Σ ) κ n 10 of 13
Marginal posterior distribution, p ( Σ | y 1 , · · · , y n ) • p ( µ , Σ | y 1 , · · · , y n ) = p ( µ | Σ , y 1 , · · · , y n ) p ( Σ | y 1 , · · · , y n ) • p ( Σ | y 1 , · · · , y n ) requires averaging the joint distribution p ( µ , Σ | y 1 , · · · , y n ) over µ , as a result, we have Σ | y 1 , · · · , y n ∼ Inv-Wishart ν n ( Λ − 1 n ) y − µ 0 ) T with κ 0 n where Λ n = Λ 0 + S + κ 0 + n (¯ y − µ 0 )(¯ S = � n y ) T i =1 ( y i − ¯ y )( y i − ¯ 11 of 13
Marginal posterior distribution of µ , p ( µ | y 1 , · · · , y n ) • Estimand of interest: µ • To obtain the marginal posterior distribution of µ : ◦ our results from the univariate normal is generalized to the multivariate case: µ | y 1 , · · · , y n ∼ t ν n − d +1 ( µ n , Λ n / ( κ n ( ν n − d + 1))) where κ 0 n κ 0 + n ¯ • µ n = κ 0 + n µ 0 + y • κ n = κ 0 + n , ν n = ν 0 + n y − µ 0 ) T with S = � n κ 0 n y ) T • Λ n = Λ 0 + S + κ 0 + n (¯ y − µ 0 )(¯ i =1 ( y i − ¯ y )( y i − ¯ ◦ By simulation: • first draw Σ from p ( Σ | y 1 , · · · , y n ) with Σ | y 1 , · · · , y n ∼ Inv-Wishart ν n ( Λ − 1 n ) , • then draw µ from p ( µ | Σ , y 1 , · · · , y n ) with µ | Σ , y 1 , · · · , y n ∼ MVN( µ n , Σ κ n ) . 12 of 13
the multivariate normal model: Non-informative prior iid • y 1 , · · · , y n ∼ MVN( µ , Σ ), both µ and Σ known, use Bayesian approach to estimate µ . ◦ a common non-informative prior is the Jeffreys prior density: p ( µ , Σ ) ∝ | Σ | − ( d +1) / 2 , which is the limit of the conjugate prior density as κ 0 → 0, ν 0 → − 1, | Λ 0 | → 0. ◦ the marginal and conditional densities can be written as Σ | y 1 , · · · , y n ∼ Inv-Wishart n − 1 ( S ) , y , Σ µ | Σ , y 1 , · · · , y n ∼ MVN(¯ n ) . ◦ marginal posterior of µ µ | y 1 , · · · , y n ∼ t n − d (¯ y , S / ( n ( n − d ))) . 13 of 13
Recommend
More recommend