outline for today computation of the likelihood function
play

Outline for today Computation of the likelihood function for GLMMs - PowerPoint PPT Presentation

Outline for today Computation of the likelihood function for GLMMs - Monte Carlo methods Monte Carlo methods Rasmus Waagepetersen Computation of the likelihood function using importance Department of Mathematics sampling Aalborg


  1. Outline for today Computation of the likelihood function for GLMMs - Monte Carlo methods ◮ Monte Carlo methods Rasmus Waagepetersen ◮ Computation of the likelihood function using importance Department of Mathematics sampling Aalborg University ◮ Newton-Raphson and the EM-algorithm Denmark December 30, 2019 1 / 18 2 / 18 Generalized linear mixed effects models Likelihood for generalized linear mixed model Consider stochastic variable Y = ( Y 1 , . . . , Y n ) and random effects U . For normal linear mixed models we could compute the marginal distribution of Y directly as a multivariate normal. That is, f ( y ) is Two step formulation of GLMM: a density of a multivariate normal distribution. ◮ U ∼ N (0 , Σ). ◮ Given realization u of U , Y i independent and each follows For a generalized linear mixed model it is difficult to evaluate the density f i ( y i | u ) with mean µ i = g − 1 ( η i ) and linear predictor integral: η = X β + Z u . � � f ( y ) = R m f ( y , u ) d u = R m f ( y | u ) f ( u ) d u I.e. conditional on U , Y i follows a generalized linear model. since f ( y | u ) f ( u ) is a very complex function. NB: GLMM specified in terms of marginal density of U and conditional density of Y given U . But the likelihood is the marginal density of f ( y ) which can be hard to evaluate ! 3 / 18 4 / 18

  2. Simple Monte Carlo: g ( u ) = f ( u ; τ 2 ) Monte Carlo computation of likelihood for GLMM � f ( y | u ; β ) f ( u ; τ 2 ) d u = E τ 2 f ( y | U ; β ) ≈ L SMC ( θ ) = L ( θ ) = Likelihood function is an expectation: R M 1 � where U l ∼ N (0 , τ 2 ) independent � f ( y | U l ; β ) f ( y | u ; β ) f ( u ; τ 2 ) d u = E τ 2 f ( y | U ; β ) L ( θ ) = f ( y ; θ ) = M R l =1 Use Monte Carlo simulations to approximate expectation. Monte Carlo variance: NB: also applicable in high dimensions V ar ( L SMC ( θ )) = 1 M V ar f ( y | U 1 ; β ) However, naive methods may fail ! NB: variance is of order 1 / M regardless of dimension ! NB: large M is needed to get large reduction of standard error √ which is of order 1 / M 5 / 18 6 / 18 Importance sampling Consider Z ∼ f and suppose we wish to evaluate E h ( Z ) where h ( Z ) has large variance. Estimate V ar f ( y | U 1 ; β ) using empirical variance estimate based on f ( y | U l ; β ), l = 1 , . . . , M : Suppose we can find density g so that M 1 h ( z ) f ( z ) � ( f ( y | U l ; β ) − L SMC ( θ )) 2 ≈ const and h ( z ) f ( z ) > 0 ⇒ g ( z ) > 0 M − 1 g ( z ) l =1 Then � h ( z ) f ( z ) g ( z ) d z = E h ( Y ) f ( Y ) E h ( Z ) = Often V ar f ( y | U 1 ; β ) is large so large M is needed. g ( z ) g ( Y ) where Y ∼ g . Increasing dimension leads to worse performance (useless in high dimensions) Note variance of h ( Y ) f ( Y ) / g ( Y ) small so estimate n E h ( Z ) ≈ 1 h ( Y i ) f ( Y i ) � n g ( Y i ) i =1 has small Monte Carlo error. 7 / 18 8 / 18

  3. Importance sampling for GLMM Possibility: Note g ( · ) probability density on R m . f ( y | u , β ) f ( u ; τ 2 ) = f ( y ; θ ) = L ( θ ) = const f ( u | y ; θ ) f ( y | u ; β ) f ( u ; τ 2 ) � � f ( y | u ; β ) f ( u ; τ 2 ) d u = L ( θ ) = g ( u ) d u = g ( u ) R R � µ LP , σ 2 � Laplace: U | Y = y ≈ N LP E f ( y | V ; β ) f ( V ; τ 2 ) where V ∼ g ( · ) . g ( V ) � µ LP , σ 2 � � µ LP , σ 2 � Use g ( · ) density for N or t ν -distribution: LP LP M f ( y | V l ; β ) f ( V l ; τ 2 ) f ( y | u , β ) f ( u ; τ 2 ) L ( θ ) ≈ L IS , h ( θ ) = 1 where V l ∼ g ( · ) , l = 1 , . . . , M � ≈ const g ( V l ) g ( u ) M l =1 so small variance. Find g so V ar f ( y | V ; β ) f ( V ; τ 2 ) small. g ( V ) Simulation straightforward. V ar L IS , h ( θ ) < ∞ if f ( y | v ; θ ) f ( v ; θ ) / g ( v ) bounded (i.e. use g ( · ) Note: ‘Monte Carlo version’ of adaptive Gaussian quadrature. with heavy tails). 9 / 18 10 / 18 Possibility: Consider fixed θ 0 : Simulation of random variables Direct methods exists for many standard distributions (normal, g ( u ) = f ( u | y , θ 0 ) = f ( y | u ; θ 0 ) f ( u ; θ 0 ) / L ( θ 0 ) binomial, t , etc.: rnorm() , rbinom() , rt() etc.) Then Suppose f is a non-standard density but f ( y | u ; β ) f ( u ; τ 2 ) f ( y | u ; β ) f ( u ; τ 2 ) � � L ( θ ) = g ( u ) d u = L ( θ 0 ) 0 ) f ( u | y , θ 0 ) d u = f ( y | u ; β 0 ) f ( u ; τ 2 g ( u ) f ( z ) ≤ Kg ( z ) R R � f ( y | U ; β ) f ( U ; τ 2 ) � f ( y | U ; β ) f ( U ; τ 2 ) ⇔ L ( θ ) � � for some constant K and standard density g . L ( θ 0 ) E θ 0 0 ) | Y = y L ( θ 0 ) = E θ 0 0 ) | Y = y f ( y | U ; β 0 ) f ( U ; τ 2 f ( y | U ; β 0 ) f ( U ; τ 2 Then we may apply rejection sampling: So we can estimate ratio L ( θ ) / L ( θ 0 ) where L ( θ 0 ) is unknown 1. Generate Y ∼ g and W ∼ unif[0 , 1]. constant. f ( Y ) 2. If W ≤ Kg ( Y ) return Y (accept); otherwise go to 1 (reject). Note probability of accept is 1 / K . This suffices for finding MLE: M If f is high-dimensional density it may be hard to find g with small f ( y | U l ; β ) f ( U l ; τ 2 ) L ( θ ) 1 � arg max L ( θ ) = arg max L ( θ 0 ) ≈ arg max K so rejection sampling mainly useful in small dimensions. f ( y | U l ; β 0 ) f ( U l ; τ 2 M 0 ) θ θ θ l =1 MCMC is then useful alternative (we’ll briefly consider this in MRF where U l ∼ f ( u | y ; θ 0 ) part of course) 11 / 18 12 / 18

  4. Prediction of U using conditional simulation Proof of rejection sampling: Compute Monte Carlo estimate of E ( U | Y = y ) using importance Show sampling or conditional simulations of U | Y = y : � y Y ≤ y | W ≤ f ( Y ) � � P ( Y ≤ y | accept ) = P = f ( v ) d v Kg ( Y ) M −∞ E ( U | Y = y ) = 1 U m ∼ f ( u | y ) � U m , M m =1 f ( Y ) Hint: write out P ( Y ≤ y | W ≤ Kg ( Y ) ) as integral in terms of the We can also evaluate e.g. P ( U i > c | y ) or P ( U i > U l , l � = i | Y ) etc. densities of Y and W . 13 / 18 14 / 18 Conditional simulation of U | Y = y using rejection Maximization of likelihood using Newton-Raphson sampling Let V θ ( y , u ) = d d θ log f ( y , u | θ ) Then Note u ( θ ) = d d θ log L ( θ ) = E θ [ V θ ( y , U ) | Y = y ] f ( u | y ; θ ) = f ( y | u ; β ) f ( u ; τ 2 ) / f ( y ; θ ) ≤ K t ν ( u ; µ LP , σ 2 LP ) and for some constant K . d 2 j ( θ ) = − d θ T d θ log L ( θ ) Rejection sampling: E θ [ d V θ ( y , U ) / d θ T | Y = y ] − V ar θ [ V θ ( y , U ) | Y = y ] � � = − 1. Generate V ∼ t ν ( µ LP , σ 2 LP ) and W ∼ Unif(]0 , 1[) Newton-Raphson: 2. Return V if W ≤ f ( V | y ; θ ) / ( K t ( V ; µ LP , σ 2 LP )); otherwise go to 1. θ l +1 = θ l + j ( θ l ) − 1 u ( θ l ) All unknown expectations and variances can be estimated using the previous numerical integration or Monte Carlo methods ! 15 / 18 16 / 18

  5. EM-algorithm Exercises Given current estimate θ l : 1. why heavy-tailed importance sampling density ? (look at variance of Monte Carlo estimate) 1. (E) compute Q ( θ l , θ ) = E θ l [log f ( y , U | θ ) | Y = y ] 2. R exercises on exercise-sheet exercises_imp.pdf . 2. (M) θ l +1 = argmax θ Q ( θ l , θ ). 3. Show that the rejection sampler works. For LNMM E-step can be computed explicitly but seems pointless 4. Simulate a binomial distribution ( n = 10 , p = 0 . 2) using as likelihood is available in closed form. simulations of a Poisson distribution (mean 2) and rejection sampling. Can you simulate a Poisson using simulations of a For GLMMs (E) step needs numerical integration or Monte Carlo. binomial ? 5. Any remaining exercises from previous lectures. Convergence of EM-algorithm can be quite slow. Maximization of likelihood using Newton-Raphson seems better alternative. 17 / 18 18 / 18

Recommend


More recommend