Hierarchical models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 21, 2019 Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 1 / 21
Outline Theoretical justification for hierarchical models Exchangeability de Finetti’s theorem Application to hierarchical models Normal hierarchical model Posterior Simulation study Shrinkage Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 2 / 21
Theoretical justification for hierarchical models Exchangability Exchangeability Definition The set Y 1 , Y 2 , . . . , Y n is exchangeable if the joint probability p ( y 1 , . . . , y n ) is invariant to permutation of the indices. That is, for any permutation π , p ( y 1 , . . . , y n ) = p ( y π 1 , . . . , y π n ) . An exchangeable but not iid example: Consider an urn with one red ball and one blue ball with probability 1/2 of drawing either. Draw without replacement from the urn. Let Y i = 1 if the i th ball is red and otherwise Y i = 0 . Since 1 / 2 = P ( Y 1 = 1 , Y 2 = 0) = P ( Y 1 = 0 , Y 2 = 1) = 1 / 2 , Y 1 and Y 2 are exchangeable. But 0 = P ( Y 2 = 1 | Y 1 = 1) � = P ( Y 2 = 1) = 1 / 2 and thus Y 1 and Y 2 are not independent. Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 3 / 21
Theoretical justification for hierarchical models Exchangability Exchangeability Theorem All independent and identically distributed random variables are exchangeable. Proof. iid Let y i ∼ p ( y ) , then n n � � p ( y 1 , . . . , y n ) = p ( y i ) = p ( y π i ) = p ( y π 1 , . . . , y π n ) i =1 i =1 Definition The sequence Y 1 , Y 2 , . . . is infinitely exchangeable if, for any n , Y 1 , Y 2 , . . . , Y n are exchangeable. Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 4 / 21
Theoretical justification for hierarchical models de Finetti’s theorem de Finetti’s theorem Theorem A sequence of random variables ( y 1 , y 2 , . . . ) is infinitely exchangeable iff, for all n , n � � p ( y 1 , y 2 , . . . , y n ) = p ( y i | θ ) P ( dθ ) , i =1 for some measure P on θ . If the distribution on θ has a density, we can replace P ( dθ ) with p ( θ ) dθ . This means that there must exist a parameter θ , ind a likelihood p ( y | θ ) such that y i ∼ p ( y | θ ) , and a distribution P on θ . Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 5 / 21
Theoretical justification for hierarchical models Hierarchical models Application to hierarchical models Assume ( y 1 , y 2 , . . . ) are infinitely exchangeable, then by de Finetti’s theorem for the ( y 1 , . . . , y n ) that you actually observed, there exists a parameter θ , ind a distribution p ( y | θ ) such that y i ∼ p ( y | θ ) , and a distribution P on θ . Assume θ = ( θ 1 , θ 2 , . . . ) with θ i infinitely exchangeable. By de Finetti’s theorem for ( θ 1 , . . . , θ n ) , there exists a parameter φ , ind a distribution p ( θ | φ ) such that θ i ∼ p ( θ | φ ) , and a distribution P on φ . Assume φ = φ with φ ∼ p ( φ ) . Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 6 / 21
Theoretical justification for hierarchical models Covariate information Exchangeability with covariates Suppose we observe y i observations and x i covariates for each unit i . Now we assume ( y 1 , y 2 , . . . ) are infinitely exchangeable given x i , then by de Finetti’s theorem for the ( y 1 , . . . , y n ) , there exists a parameter θ , ind a distribution p ( y | θ, x ) such that y i ∼ p ( y | θ, x i ) , and a distribution P on θ given x . Assume θ = ( θ 1 , θ 2 , . . . ) with θ i infinitely exchangeable given x . By de Finetti’s theorem for ( θ 1 , . . . , θ n ) , there exists a parameter φ , ind a distribution p ( θ | φ, x ) such that θ i ∼ p ( θ | φ, x i ) , and a distribution P on φ given x . Assume φ = φ with φ ∼ p ( φ | x ) . Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 7 / 21
Summary Summary Hierarchical model: ind ind y i ∼ p ( y | θ i ) , θ i ∼ p ( θ | φ ) , φ ∼ p ( φ ) Hierarchical linear model: ind ind y i ∼ p ( y | θ i , x i ) , θ i ∼ p ( θ | φ, x i ) , φ ∼ p ( φ | x ) Although hierarchical models are typically written using the conditional independence notation above, the assumptions underlying the model are exchangeability and functional forms for the priors. Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 8 / 21
Normal hierarchical models Normal hierarchical models Suppose we have the following model ind ∼ N ( θ i , σ 2 ) y ij iid ∼ N ( µ, τ 2 ) θ i with j = 1 , . . . , n i , i = 1 , . . . , I , and n = � I i =1 n i . This is a normal hierarchical model. Make the following assumptions for computational reasons: Let σ 2 = s 2 be known. Assume p ( µ, τ ) ∝ p ( µ | τ ) p ( τ ) ∝ p ( τ ) , i.e. assume an improper uniform prior on µ . Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 9 / 21
Normal hierarchical models Posterior Posterior distribution The posterior is p ( θ, µ, τ | y ) ∝ p ( y | θ ) p ( θ | µ, τ ) p ( µ | τ ) p ( τ ) but the decomposition p ( θ, µ, τ | y ) = p ( θ | µ, τ, y ) p ( µ | τ, y ) p ( τ | y ) where p ( θ | µ, τ, y ) ∝ p ( y | θ ) p ( θ | µ, τ ) � p ( µ | τ, y ) ∝ p ( y | θ ) p ( θ | µ, τ ) dθ p ( µ | τ ) � p ( τ | y ) ∝ p ( y | θ ) p ( θ | µ, τ ) p ( µ | τ ) dθdµ p ( τ ) will aide computation via 1. τ ( k ) ∼ p ( τ | y ) 2. µ ( k ) ∼ p µ | τ ( k ) , y � � 3. θ ( k ) θ | µ ( k ) , τ ( k ) , y � � ∼ p for i = 1 , . . . , I . i Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 10 / 21
Normal hierarchical models Posterior Posterior distributions The necessary conditional and marginal posteriors are presented in section 5.4 of BDA. Let n i y i · = 1 � s 2 i = s 2 /n i y ij and n i j =1 Then � � ∝ p ( τ ) V 1 / 2 i + τ 2 ) − 1 / 2 exp − ( y i · − ˆ µ ) 2 � I i =1 ( s 2 p ( τ | y ) µ 2( s 2 i + τ 2 ) µ | τ, y ∼ N (ˆ µ, V µ ) ∼ N (ˆ θ i | µ, τ, y θ i , V i ) �� I � = � I y · i V − 1 1 µ ˆ = V µ µ i =1 i =1 s 2 i + τ 2 s 2 i + τ 2 � � y i · V − 1 ˆ i + µ = 1 i + 1 θ i = V i i s 2 τ 2 s 2 τ 2 Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 11 / 21
Normal hierarchical models Simulation study Simulation study Common to both simulation scenarios: I = 10 n i = 9 for all i s = 1 thus s i = 1 / 3 for all i Scenarios: 1. Common mean: θ i = 0 for all i 2. Group-specific means: θ i = i − (I / 2 + . 5) Use τ ∼ Ca + (0 , 1) . Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 12 / 21
Normal hierarchical models Simulation study Simulation study J = 10 n_per_group = 9 n = rep(n_per_group,J) sigma = 1 N = sum(n) group = rep(1:J, each=n_per_group) set.seed(1) df = rbind(data.frame(group = factor(group), simulation = "common_mean", y = rnorm(N )), # All means are the same data.frame(group = factor(group), simulation = "group_specific_mean", y = rnorm(N, group-(J/2+.5)))) # Each group has its own mean Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 13 / 21
Normal hierarchical models Simulation study common_mean group_specific_mean group 4 1 2 3 4 5 y 0 6 7 8 9 10 −4 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 group Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 14 / 21
Normal hierarchical models Simulation study Summary statistics simulation group n mean sd 1 common mean 1 9 0.18 0.81 2 common mean 2 9 0.09 1.11 3 common mean 3 9 0.18 0.91 4 common mean 4 9 -0.19 0.89 5 common mean 5 9 0.17 0.62 6 common mean 6 9 0.02 0.70 7 common mean 7 9 0.61 1.14 8 common mean 8 9 0.14 1.19 9 common mean 9 9 -0.31 0.60 10 common mean 10 9 0.20 0.81 11 group specific mean 1 9 -4.32 1.10 12 group specific mean 2 9 -3.40 0.88 13 group specific mean 3 9 -2.41 0.89 14 group specific mean 4 9 -1.38 0.60 15 group specific mean 5 9 -0.76 0.61 16 group specific mean 6 9 -0.16 0.95 17 group specific mean 7 9 1.21 1.12 18 group specific mean 8 9 2.23 1.15 19 group specific mean 9 9 3.97 1.26 20 group specific mean 10 9 5.08 0.77 Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 15 / 21
Normal hierarchical models Sampling on a grid Sampling on a grid Consider samping from an arbitrary unnormalized density f ( τ ) ∝ p ( τ | y ) using the following approach 1. Construct a step-function approximation to this density: a. Determine an interval [ L, U ] such that outside this interval f ( τ ) is small. b. Set an interval half-width h to generate a grid of M points ( x 1 , . . . , x M ) in this interval, i.e. x 1 = L + h and x m = x m − 1 + 2 h ∀ 1 < m ≤ M. c. Evaluate the density on this grid, i.e. f ( x m ) . �� M d. Normalize interval weights, i.e. w m = f ( x m ) i =1 f ( x i ) (to constructed a normalized density, divide each w m by 2 h .). 2. Sampling from this approximation: a. Sample an interval m with probability w m . b. Sample uniformly within this interval, i.e. τ ∼ Unif ( x m − h, x m + h ) . Jarad Niemi (STAT544@ISU) Hierarchical models (cont.) February 21, 2019 16 / 21
Recommend
More recommend