Outline for today Computation of the likelihood function for GLMMs - Monte Carlo methods ◮ Monte Carlo methods Rasmus Waagepetersen ◮ Computation of the likelihood function using importance Department of Mathematics sampling Aalborg University ◮ Newton-Raphson and the EM-algorithm Denmark December 30, 2019 1 / 18 2 / 18 Generalized linear mixed effects models Likelihood for generalized linear mixed model Consider stochastic variable Y = ( Y 1 , . . . , Y n ) and random effects U . For normal linear mixed models we could compute the marginal distribution of Y directly as a multivariate normal. That is, f ( y ) is Two step formulation of GLMM: a density of a multivariate normal distribution. ◮ U ∼ N (0 , Σ). ◮ Given realization u of U , Y i independent and each follows For a generalized linear mixed model it is difficult to evaluate the density f i ( y i | u ) with mean µ i = g − 1 ( η i ) and linear predictor integral: η = X β + Z u . � � f ( y ) = R m f ( y , u ) d u = R m f ( y | u ) f ( u ) d u I.e. conditional on U , Y i follows a generalized linear model. since f ( y | u ) f ( u ) is a very complex function. NB: GLMM specified in terms of marginal density of U and conditional density of Y given U . But the likelihood is the marginal density of f ( y ) which can be hard to evaluate ! 3 / 18 4 / 18
Simple Monte Carlo: g ( u ) = f ( u ; τ 2 ) Monte Carlo computation of likelihood for GLMM � f ( y | u ; β ) f ( u ; τ 2 ) d u = E τ 2 f ( y | U ; β ) ≈ L SMC ( θ ) = L ( θ ) = Likelihood function is an expectation: R M 1 � where U l ∼ N (0 , τ 2 ) independent � f ( y | U l ; β ) f ( y | u ; β ) f ( u ; τ 2 ) d u = E τ 2 f ( y | U ; β ) L ( θ ) = f ( y ; θ ) = M R l =1 Use Monte Carlo simulations to approximate expectation. Monte Carlo variance: NB: also applicable in high dimensions V ar ( L SMC ( θ )) = 1 M V ar f ( y | U 1 ; β ) However, naive methods may fail ! NB: variance is of order 1 / M regardless of dimension ! NB: large M is needed to get large reduction of standard error √ which is of order 1 / M 5 / 18 6 / 18 Importance sampling Consider Z ∼ f and suppose we wish to evaluate E h ( Z ) where h ( Z ) has large variance. Estimate V ar f ( y | U 1 ; β ) using empirical variance estimate based on f ( y | U l ; β ), l = 1 , . . . , M : Suppose we can find density g so that M 1 h ( z ) f ( z ) � ( f ( y | U l ; β ) − L SMC ( θ )) 2 ≈ const and h ( z ) f ( z ) > 0 ⇒ g ( z ) > 0 M − 1 g ( z ) l =1 Then � h ( z ) f ( z ) g ( z ) d z = E h ( Y ) f ( Y ) E h ( Z ) = Often V ar f ( y | U 1 ; β ) is large so large M is needed. g ( z ) g ( Y ) where Y ∼ g . Increasing dimension leads to worse performance (useless in high dimensions) Note variance of h ( Y ) f ( Y ) / g ( Y ) small so estimate n E h ( Z ) ≈ 1 h ( Y i ) f ( Y i ) � n g ( Y i ) i =1 has small Monte Carlo error. 7 / 18 8 / 18
Importance sampling for GLMM Possibility: Note g ( · ) probability density on R m . f ( y | u , β ) f ( u ; τ 2 ) = f ( y ; θ ) = L ( θ ) = const f ( u | y ; θ ) f ( y | u ; β ) f ( u ; τ 2 ) � � f ( y | u ; β ) f ( u ; τ 2 ) d u = L ( θ ) = g ( u ) d u = g ( u ) R R � µ LP , σ 2 � Laplace: U | Y = y ≈ N LP E f ( y | V ; β ) f ( V ; τ 2 ) where V ∼ g ( · ) . g ( V ) � µ LP , σ 2 � � µ LP , σ 2 � Use g ( · ) density for N or t ν -distribution: LP LP M f ( y | V l ; β ) f ( V l ; τ 2 ) f ( y | u , β ) f ( u ; τ 2 ) L ( θ ) ≈ L IS , h ( θ ) = 1 where V l ∼ g ( · ) , l = 1 , . . . , M � ≈ const g ( V l ) g ( u ) M l =1 so small variance. Find g so V ar f ( y | V ; β ) f ( V ; τ 2 ) small. g ( V ) Simulation straightforward. V ar L IS , h ( θ ) < ∞ if f ( y | v ; θ ) f ( v ; θ ) / g ( v ) bounded (i.e. use g ( · ) Note: ‘Monte Carlo version’ of adaptive Gaussian quadrature. with heavy tails). 9 / 18 10 / 18 Possibility: Consider fixed θ 0 : Simulation of random variables Direct methods exists for many standard distributions (normal, g ( u ) = f ( u | y , θ 0 ) = f ( y | u ; θ 0 ) f ( u ; θ 0 ) / L ( θ 0 ) binomial, t , etc.: rnorm() , rbinom() , rt() etc.) Then Suppose f is a non-standard density but f ( y | u ; β ) f ( u ; τ 2 ) f ( y | u ; β ) f ( u ; τ 2 ) � � L ( θ ) = g ( u ) d u = L ( θ 0 ) 0 ) f ( u | y , θ 0 ) d u = f ( y | u ; β 0 ) f ( u ; τ 2 g ( u ) f ( z ) ≤ Kg ( z ) R R � f ( y | U ; β ) f ( U ; τ 2 ) � f ( y | U ; β ) f ( U ; τ 2 ) ⇔ L ( θ ) � � for some constant K and standard density g . L ( θ 0 ) E θ 0 0 ) | Y = y L ( θ 0 ) = E θ 0 0 ) | Y = y f ( y | U ; β 0 ) f ( U ; τ 2 f ( y | U ; β 0 ) f ( U ; τ 2 Then we may apply rejection sampling: So we can estimate ratio L ( θ ) / L ( θ 0 ) where L ( θ 0 ) is unknown 1. Generate Y ∼ g and W ∼ unif[0 , 1]. constant. f ( Y ) 2. If W ≤ Kg ( Y ) return Y (accept); otherwise go to 1 (reject). Note probability of accept is 1 / K . This suffices for finding MLE: M If f is high-dimensional density it may be hard to find g with small f ( y | U l ; β ) f ( U l ; τ 2 ) L ( θ ) 1 � arg max L ( θ ) = arg max L ( θ 0 ) ≈ arg max K so rejection sampling mainly useful in small dimensions. f ( y | U l ; β 0 ) f ( U l ; τ 2 M 0 ) θ θ θ l =1 MCMC is then useful alternative (we’ll briefly consider this in MRF where U l ∼ f ( u | y ; θ 0 ) part of course) 11 / 18 12 / 18
Prediction of U using conditional simulation Proof of rejection sampling: Compute Monte Carlo estimate of E ( U | Y = y ) using importance Show sampling or conditional simulations of U | Y = y : � y Y ≤ y | W ≤ f ( Y ) � � P ( Y ≤ y | accept ) = P = f ( v ) d v Kg ( Y ) M −∞ E ( U | Y = y ) = 1 U m ∼ f ( u | y ) � U m , M m =1 f ( Y ) Hint: write out P ( Y ≤ y | W ≤ Kg ( Y ) ) as integral in terms of the We can also evaluate e.g. P ( U i > c | y ) or P ( U i > U l , l � = i | Y ) etc. densities of Y and W . 13 / 18 14 / 18 Conditional simulation of U | Y = y using rejection Maximization of likelihood using Newton-Raphson sampling Let V θ ( y , u ) = d d θ log f ( y , u | θ ) Then Note u ( θ ) = d d θ log L ( θ ) = E θ [ V θ ( y , U ) | Y = y ] f ( u | y ; θ ) = f ( y | u ; β ) f ( u ; τ 2 ) / f ( y ; θ ) ≤ K t ν ( u ; µ LP , σ 2 LP ) and for some constant K . d 2 j ( θ ) = − d θ T d θ log L ( θ ) Rejection sampling: E θ [ d V θ ( y , U ) / d θ T | Y = y ] − V ar θ [ V θ ( y , U ) | Y = y ] � � = − 1. Generate V ∼ t ν ( µ LP , σ 2 LP ) and W ∼ Unif(]0 , 1[) Newton-Raphson: 2. Return V if W ≤ f ( V | y ; θ ) / ( K t ( V ; µ LP , σ 2 LP )); otherwise go to 1. θ l +1 = θ l + j ( θ l ) − 1 u ( θ l ) All unknown expectations and variances can be estimated using the previous numerical integration or Monte Carlo methods ! 15 / 18 16 / 18
EM-algorithm Exercises Given current estimate θ l : 1. why heavy-tailed importance sampling density ? (look at variance of Monte Carlo estimate) 1. (E) compute Q ( θ l , θ ) = E θ l [log f ( y , U | θ ) | Y = y ] 2. R exercises on exercise-sheet exercises_imp.pdf . 2. (M) θ l +1 = argmax θ Q ( θ l , θ ). 3. Show that the rejection sampler works. For LNMM E-step can be computed explicitly but seems pointless 4. Simulate a binomial distribution ( n = 10 , p = 0 . 2) using as likelihood is available in closed form. simulations of a Poisson distribution (mean 2) and rejection sampling. Can you simulate a Poisson using simulations of a For GLMMs (E) step needs numerical integration or Monte Carlo. binomial ? 5. Any remaining exercises from previous lectures. Convergence of EM-algorithm can be quite slow. Maximization of likelihood using Newton-Raphson seems better alternative. 17 / 18 18 / 18
Recommend
More recommend