Hierarchical Modeling Hierarchical modeling has taken over the landscape in contemporaery stochastic modeling Intent here is to show a range of examples In the development, we will also show the connections to Gibbs sampling, why Gibbs sampling and MCMC are ideally suited to fit these models We envision a three stage specification First stage: [data|model, parameters] Second stage: [model|parameters] Third stage: [(hyper)parameters] Hierarchical Modeling – p. 1/13
Standard hierarchical linear model First stage: Y | X , β ∼ N ( X β , Σ Y ) Second stage: β | Z , α ∼ N ( Z α , Σ β ) Third stage: α ∼ N ( α 0 , Σ α ) Assumes all Σ ’s known. If not, inverse Gamma or Wishart priors Standard Gibbs loop to update; conjugacy for all full conditionals Hierarchical Modeling – p. 2/13
CIHM Conditionally independent hierarchical model Π i [ Y i | θ i ]Π i [ θ i | η ][ η ] Exchangeable θ i Shrinkage - borrowing strength if η unknown Includes hierarchical GLM, i.e., nonGaussian first stage Hierarchical Modeling – p. 3/13
Random effects Random effects are usually assumed to be normally distributed with an associated variance component Typical linear version: Y ij = X T ij β + φ i + ǫ ij β has a Gaussian prior φ i iid ∼ N (0 , σ 2 φ ) ǫ ij iid ∼ N (0 , σ 2 ǫ ) Priors on variance components, σ 2 φ , σ 2 ǫ (with care) Again, can have a nonGaussian first stage Hierarchical Modeling – p. 4/13
Missing data Often have missing data Gibbs sampler (MCMC) extends the E-M algorithm to provide full posterior inference rather than an MLE with an asymptotic variance Simple example: Multivariate normal, Y i ∼ N ( µ , Σ) Some components of some of the Y i are missing A usual Gibbs loop: update parameters given missing data, update missing data given parameters Simple example: Missing categorical counts with a multinomial model Some categories are aggregated/collapsed so counts for the disaggregated categories are missing Again, usual Gibbs loop: update parameters given all counts, update missing counts given parameters Hierarchical Modeling – p. 5/13
Binary data models Usual binary response model is logit or probit Illustrate for probit Y i ∼ Bernoulli ( p ( X i )) (can be Bi ( n i , p ( X i )) ) Φ − 1 ( p ( X i )) = X i β with a prior on β Awkward to sample β in this form Introduce Z i ∼ N ( X i β , 1) P ( Y i = 1) = Φ( X i β ) = 1 − Φ( − X i β ) = P ( Z i ≥ 0) So, Gibbs loop: update Z ’s given β , y (truncated normal), update β given Z ’s and y (usual normal updating) Can extend to ordinal categorical data; multiple unknown cut points Hierarchical Modeling – p. 6/13
Growth Curves Typically, individual level curves centered around a population level curve Need population level curves to see average behavior of process Need individual level curves to prescribe individual level treatment Model: If Y ij is j th measurement for i th individual, Y ij = g ( X ij , Z i , β i ) + ǫ ij ǫ ij ∼ N (0 , σ 2 i ) β i = β + η i (or replace β with a regression in the Z i ) Hierarchical Modeling – p. 7/13
Mixture models Y ∼ � L l =1 p l f l ( Y | θ l ) , e.g., a normal mixture Also called classification problem or discriminant analysis L fixed or unknown? Observe Y i , i = 1 , 2 , ..., n Label for Y i is not observed (latent) If L i = l , then Y i ∼ f l ( Y | θ l ) So model is: Π i [ Y i | L i , θ ][Π i [ L i | α ][ α , θ ] Gibbs loop. Update β , α given L ’s and data. Update L ’s given β , α and data (discrete distribution) Covariates? In θ l ’s? In p l ’s? Hierarchical Modeling – p. 8/13
Errors in variables models Seek the relationship between say Y and X Observe say W a surrogate for X , perhaps observe Z , a surrogate for Y Model W | X - measurement error model; model X | W - Berkson model Model in first case: Π i [ Z i | Y i , γ ][ Y i | X i , β ][ W i | X i , γ ][ X i | α ] Model in second case: Π i [ Z i | Y i , γ ][ Y i | X i , β ][ X i | W i , γ ] Validation data: Perhaps some X, Y pairs; perhaps some X, W pairs Hierarchical Modeling – p. 9/13
Change point models Frequently, interest in a change in regime Need idea of a “least” significant change Two sampling scenarios - (i) full set of data. Try to find, retrospectively, if changes occurred and when, (ii) sequential data, try to identify changes as we collect. Simple first scenario example: f 1 ( y | θ 1 ) density before the change point; f 2 ( y | θ 2 ) density after the change point With data Y i , i = 1 , 2 , ..., n , then K , change point indicator, i.e., K = k means change at observation k + 1 ; k = n means “no change.” Then, model is L ( θ 1 , θ 2 , k ; y ) = Π k i =1 f 1 ( y i | θ 1 )Π n i = k +1 f 2 ( y i | θ 2 ) With a prior on θ 1 , θ 2 , k , a full model Again, loop: update θ ’s given k, y , update k given θ ’s and y (a discrete distribution) Hierarchical Modeling – p. 10/13 ’s can be dependent, can have order restrictions on ’s
Concurrent time series Dependent ARMA time series Model � � Y it = X T i β i + φ ij Y i,t − j + θ ik ǫ i,t − k + ǫ it j k Exchangeable β i , φ i , θ i Usual prior on β , constrained priors on the φ ’s and θ ’s ǫ t ∼ N (0 , Σ) Hierarchical Modeling – p. 11/13
Dynamic models Two stage form: observational (or data) stage and transition stage (unobserved) Simple example: Y ti = g ( X ti β t ) + ǫ ti with iid ǫ ’s First stage conditional independence β t = φ β t − 1 + η t Can have dynamics in X t ’s Then called “hidden Markov model” Hierarchical Modeling – p. 12/13
Summary So, the overall story is the following: Rich range of modeling possibilities We introduce latent variables to facilitate the writing of the likelihood and prior and the fitting of the model. These latent variables can be labels, missing data, other augmentations. MCMC model fitting is natural; we make Gibbs loops to do the required updating. Hierarchical Modeling – p. 13/13
Recommend
More recommend