Hierarchical Models Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago November 9, 2017 Hierarchical Models 1 Last edited November 9, 2017 by <ebalderama@luc.edu>
Layers of hierarchy The hierarchical modeling framework is popular in the Bayesian literature because MCMC is conducive to hierarchical models. Bayesian models can be written with the following basic layers of hierarchy: Data layer [ Y | θ , α ] is the likelihood for the observed data Y . 1 Process layer [ θ | α ] is the model for the parameters θ that define the 2 latent data generating process. Prior layer [ α ] priors for hyperparameters. 3 Hierarchical Models 2 Last edited November 9, 2017 by <ebalderama@luc.edu>
Hierarchical models and MCMC Example: One-way random effects model θ i , σ 2 � µ, τ 2 � � � Y ij ∼ Normal and θ i ∼ Normal where Y ij is the j th replicate for unit i and α = ( µ, σ 2 , τ 2 ) has an uninformative prior. This hierarchy can be written using a directed acyclic graph (DAG) , also called a Bayesian network or belief network . Hierarchical Models 3 Last edited November 9, 2017 by <ebalderama@luc.edu>
Hierarchical models and MCMC MCMC is efficient even if the number of parameters or levels of hierarchy is large. You only need to consider connected nodes when updating each parameter, e.g., [ θ i |· ] 1 [ µ |· ] 2 [ σ 2 |· ] 3 [ τ 2 |· ] 4 Each of these updates is a draw from a standard one-dimensional normal or inverse gamma. Hierarchical Models 4 Last edited November 9, 2017 by <ebalderama@luc.edu>
Two-way random effects model Example: Ozone measurements ozone <- read.csv("http://math.luc.edu/~ebalderama/ bayes_resources/data/ozone.csv") For the Ozone measurement data, we fit the model � µ + α i + γ j , σ 2 � Y ij ∼ Normal where µ is the overall mean. α i is the random effect for spatial location i and γ j is the random effect of day j . Hierarchical Models 5 Last edited November 9, 2017 by <ebalderama@luc.edu>
Two-way random effects model The likelihood model is µ + α i + γ j , σ 2 � � Y ij ∼ Normal Priors for the fixed effects model : 0 , 10 2 � 0 , 10 2 � � � α i ∼ Normal and γ j ∼ Normal Priors for the random effects model : 0 , σ 2 0 , σ 2 � � � � α i ∼ Normal and γ j ∼ Normal a g Hierarchical Models 6 Last edited November 9, 2017 by <ebalderama@luc.edu>
Random slopes model Example: Jaw bone height data load(url("http://math.luc.edu/~ebalderama/bayes_resources /data/jaw.RData")) Let Y ij be the bone density for child i at age X j . We may specify a different regression for each child to capture variability over the population of children: γ 0 i + X j γ 1 i , σ 2 � � Y ij ∼ Normal where γ i = ( γ 0 i , γ 1 i ) T controls the growth curve for child i . These separate regressions are tied together in the prior, γ i ∼ Normal ( θ , Σ) which borrows strength across children. This is called a linear mixed model : γ i are random effects specific to one child and θ are the fixed effects common to all children. Hierarchical Models 7 Last edited November 9, 2017 by <ebalderama@luc.edu>
Bone height data ● ● ● ● ● ● ● ● 54 ● ● ● ● ● ● ● ● ● ● ● ● ● 52 ● ● ● ● ● ● ● ● ● ● ● Bone height ● ● ● ● 50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 48 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 46 ● ● ● ● 8.0 8.5 9.0 9.5 Age Hierarchical Models 8 Last edited November 9, 2017 by <ebalderama@luc.edu>
Prior for covariance matrix � σ 2 � σ 12 1 The random-effects covariance matrix is Σ = σ 2 σ 12 2 where σ 2 1 is the variance of the intercepts across children, σ 2 2 is the variance of the slopes across children, σ 12 is the covariance between the intercepts and slopes. Several ways to specify the prior: σ 12 σ 2 1 , σ 2 2 ∼ InverseGamma and ρ = σ 1 σ 2 ∼ Uniform ( − 1 , 1 ) , 1 Inverse Wishart , which works better in higher dimensions. 2 Hierarchical Models 9 Last edited November 9, 2017 by <ebalderama@luc.edu>
Inverse Wishart distribution The Inverse Wishart distribution is the conjugate prior for a p × p covariance matrix of a multivariate normal distribution. It reduces to the (univariate) inverse gamma distribution if p = 1. Σ ∼ InverseWishart ( ν, S ) implies Σ − 1 ∼ Wishart ( ν, S − 1 ) , where hyperparameters ν > p − 1 is the degrees of freedom, and S is a p × p positive-definite scale matrix. The Inverse Wishart PDF is � 1 S Σ − 1 �� f (Σ) ∝ | Σ | − ( ν + p + 1 ) / 2 exp � 2trace 1 and the mean is E (Σ) = ν − p − 1 S . Hierarchical Models 10 Last edited November 9, 2017 by <ebalderama@luc.edu>
Full conditional distributions The hierarchical model is then: � γ 0 i + X j γ 1 i , σ 2 � Y ij ∼ Normal γ i ∼ Normal ( θ , Σ) f ( θ ) ∝ 1 σ 2 ∼ InverseGamma ( a , b ) Σ ∼ InverseWishart ( ν, S ) The full conditionals are all conjugate! Hierarchical Models 11 Last edited November 9, 2017 by <ebalderama@luc.edu>
Recommend
More recommend