Outline for today Computation of the likelihood function for GLMMs - PowerPoint PPT Presentation

Outline for today Computation of the likelihood function for GLMMs - Monte Carlo methods ◮ Monte Carlo methods Rasmus Waagepetersen ◮ Computation of the likelihood function using importance Department of Mathematics sampling Aalborg University ◮ Newton-Raphson and the EM-algorithm Denmark December 30, 2019 1 / 18 2 / 18 Generalized linear mixed effects models Likelihood for generalized linear mixed model Consider stochastic variable Y = ( Y 1 , . . . , Y n ) and random effects U . For normal linear mixed models we could compute the marginal distribution of Y directly as a multivariate normal. That is, f ( y ) is Two step formulation of GLMM: a density of a multivariate normal distribution. ◮ U ∼ N (0 , Σ). ◮ Given realization u of U , Y i independent and each follows For a generalized linear mixed model it is difficult to evaluate the density f i ( y i | u ) with mean µ i = g − 1 ( η i ) and linear predictor integral: η = X β + Z u . � � f ( y ) = R m f ( y , u ) d u = R m f ( y | u ) f ( u ) d u I.e. conditional on U , Y i follows a generalized linear model. since f ( y | u ) f ( u ) is a very complex function. NB: GLMM specified in terms of marginal density of U and conditional density of Y given U . But the likelihood is the marginal density of f ( y ) which can be hard to evaluate ! 3 / 18 4 / 18

Simple Monte Carlo: g ( u ) = f ( u ; τ 2 ) Monte Carlo computation of likelihood for GLMM � f ( y | u ; β ) f ( u ; τ 2 ) d u = E τ 2 f ( y | U ; β ) ≈ L SMC ( θ ) = L ( θ ) = Likelihood function is an expectation: R M 1 � where U l ∼ N (0 , τ 2 ) independent � f ( y | U l ; β ) f ( y | u ; β ) f ( u ; τ 2 ) d u = E τ 2 f ( y | U ; β ) L ( θ ) = f ( y ; θ ) = M R l =1 Use Monte Carlo simulations to approximate expectation. Monte Carlo variance: NB: also applicable in high dimensions V ar ( L SMC ( θ )) = 1 M V ar f ( y | U 1 ; β ) However, naive methods may fail ! NB: variance is of order 1 / M regardless of dimension ! NB: large M is needed to get large reduction of standard error √ which is of order 1 / M 5 / 18 6 / 18 Importance sampling Consider Z ∼ f and suppose we wish to evaluate E h ( Z ) where h ( Z ) has large variance. Estimate V ar f ( y | U 1 ; β ) using empirical variance estimate based on f ( y | U l ; β ), l = 1 , . . . , M : Suppose we can find density g so that M 1 h ( z ) f ( z ) � ( f ( y | U l ; β ) − L SMC ( θ )) 2 ≈ const and h ( z ) f ( z ) > 0 ⇒ g ( z ) > 0 M − 1 g ( z ) l =1 Then � h ( z ) f ( z ) g ( z ) d z = E h ( Y ) f ( Y ) E h ( Z ) = Often V ar f ( y | U 1 ; β ) is large so large M is needed. g ( z ) g ( Y ) where Y ∼ g . Increasing dimension leads to worse performance (useless in high dimensions) Note variance of h ( Y ) f ( Y ) / g ( Y ) small so estimate n E h ( Z ) ≈ 1 h ( Y i ) f ( Y i ) � n g ( Y i ) i =1 has small Monte Carlo error. 7 / 18 8 / 18

Importance sampling for GLMM Possibility: Note g ( · ) probability density on R m . f ( y | u , β ) f ( u ; τ 2 ) = f ( y ; θ ) = L ( θ ) = const f ( u | y ; θ ) f ( y | u ; β ) f ( u ; τ 2 ) � � f ( y | u ; β ) f ( u ; τ 2 ) d u = L ( θ ) = g ( u ) d u = g ( u ) R R � µ LP , σ 2 � Laplace: U | Y = y ≈ N LP E f ( y | V ; β ) f ( V ; τ 2 ) where V ∼ g ( · ) . g ( V ) � µ LP , σ 2 � � µ LP , σ 2 � Use g ( · ) density for N or t ν -distribution: LP LP M f ( y | V l ; β ) f ( V l ; τ 2 ) f ( y | u , β ) f ( u ; τ 2 ) L ( θ ) ≈ L IS , h ( θ ) = 1 where V l ∼ g ( · ) , l = 1 , . . . , M � ≈ const g ( V l ) g ( u ) M l =1 so small variance. Find g so V ar f ( y | V ; β ) f ( V ; τ 2 ) small. g ( V ) Simulation straightforward. V ar L IS , h ( θ ) < ∞ if f ( y | v ; θ ) f ( v ; θ ) / g ( v ) bounded (i.e. use g ( · ) Note: ‘Monte Carlo version’ of adaptive Gaussian quadrature. with heavy tails). 9 / 18 10 / 18 Possibility: Consider fixed θ 0 : Simulation of random variables Direct methods exists for many standard distributions (normal, g ( u ) = f ( u | y , θ 0 ) = f ( y | u ; θ 0 ) f ( u ; θ 0 ) / L ( θ 0 ) binomial, t , etc.: rnorm() , rbinom() , rt() etc.) Then Suppose f is a non-standard density but f ( y | u ; β ) f ( u ; τ 2 ) f ( y | u ; β ) f ( u ; τ 2 ) � � L ( θ ) = g ( u ) d u = L ( θ 0 ) 0 ) f ( u | y , θ 0 ) d u = f ( y | u ; β 0 ) f ( u ; τ 2 g ( u ) f ( z ) ≤ Kg ( z ) R R � f ( y | U ; β ) f ( U ; τ 2 ) � f ( y | U ; β ) f ( U ; τ 2 ) ⇔ L ( θ ) � � for some constant K and standard density g . L ( θ 0 ) E θ 0 0 ) | Y = y L ( θ 0 ) = E θ 0 0 ) | Y = y f ( y | U ; β 0 ) f ( U ; τ 2 f ( y | U ; β 0 ) f ( U ; τ 2 Then we may apply rejection sampling: So we can estimate ratio L ( θ ) / L ( θ 0 ) where L ( θ 0 ) is unknown 1. Generate Y ∼ g and W ∼ unif[0 , 1]. constant. f ( Y ) 2. If W ≤ Kg ( Y ) return Y (accept); otherwise go to 1 (reject). Note probability of accept is 1 / K . This suffices for finding MLE: M If f is high-dimensional density it may be hard to find g with small f ( y | U l ; β ) f ( U l ; τ 2 ) L ( θ ) 1 � arg max L ( θ ) = arg max L ( θ 0 ) ≈ arg max K so rejection sampling mainly useful in small dimensions. f ( y | U l ; β 0 ) f ( U l ; τ 2 M 0 ) θ θ θ l =1 MCMC is then useful alternative (we’ll briefly consider this in MRF where U l ∼ f ( u | y ; θ 0 ) part of course) 11 / 18 12 / 18

Prediction of U using conditional simulation Proof of rejection sampling: Compute Monte Carlo estimate of E ( U | Y = y ) using importance Show sampling or conditional simulations of U | Y = y : � y Y ≤ y | W ≤ f ( Y ) � � P ( Y ≤ y | accept ) = P = f ( v ) d v Kg ( Y ) M −∞ E ( U | Y = y ) = 1 U m ∼ f ( u | y ) � U m , M m =1 f ( Y ) Hint: write out P ( Y ≤ y | W ≤ Kg ( Y ) ) as integral in terms of the We can also evaluate e.g. P ( U i > c | y ) or P ( U i > U l , l � = i | Y ) etc. densities of Y and W . 13 / 18 14 / 18 Conditional simulation of U | Y = y using rejection Maximization of likelihood using Newton-Raphson sampling Let V θ ( y , u ) = d d θ log f ( y , u | θ ) Then Note u ( θ ) = d d θ log L ( θ ) = E θ [ V θ ( y , U ) | Y = y ] f ( u | y ; θ ) = f ( y | u ; β ) f ( u ; τ 2 ) / f ( y ; θ ) ≤ K t ν ( u ; µ LP , σ 2 LP ) and for some constant K . d 2 j ( θ ) = − d θ T d θ log L ( θ ) Rejection sampling: E θ [ d V θ ( y , U ) / d θ T | Y = y ] − V ar θ [ V θ ( y , U ) | Y = y ] � � = − 1. Generate V ∼ t ν ( µ LP , σ 2 LP ) and W ∼ Unif(]0 , 1[) Newton-Raphson: 2. Return V if W ≤ f ( V | y ; θ ) / ( K t ( V ; µ LP , σ 2 LP )); otherwise go to 1. θ l +1 = θ l + j ( θ l ) − 1 u ( θ l ) All unknown expectations and variances can be estimated using the previous numerical integration or Monte Carlo methods ! 15 / 18 16 / 18

EM-algorithm Exercises Given current estimate θ l : 1. why heavy-tailed importance sampling density ? (look at variance of Monte Carlo estimate) 1. (E) compute Q ( θ l , θ ) = E θ l [log f ( y , U | θ ) | Y = y ] 2. R exercises on exercise-sheet exercises_imp.pdf . 2. (M) θ l +1 = argmax θ Q ( θ l , θ ). 3. Show that the rejection sampler works. For LNMM E-step can be computed explicitly but seems pointless 4. Simulate a binomial distribution ( n = 10 , p = 0 . 2) using as likelihood is available in closed form. simulations of a Poisson distribution (mean 2) and rejection sampling. Can you simulate a Poisson using simulations of a For GLMMs (E) step needs numerical integration or Monte Carlo. binomial ? 5. Any remaining exercises from previous lectures. Convergence of EM-algorithm can be quite slow. Maximization of likelihood using Newton-Raphson seems better alternative. 17 / 18 18 / 18

Outline for today Computation of the likelihood function for GLMMs - PowerPoint PPT Presentation

Outline for today Computation of the likelihood function for GLMMs - Monte Carlo methods Monte Carlo methods Rasmus Waagepetersen Computation of the likelihood function using importance Department of Mathematics sampling Aalborg

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Likelihood Methods of Inference Toss coin 6 times and get Heads twice. p is probability of getting

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Likelihood-Based Statistical Decisions Marco Cattaneo Seminar for Statistics ETH Z urich,

Likelihood Functions The likelihood function answers the question: What does the sensor tell about

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

1 Being Normal, Simultaneously Maximizing Likelihood with Uniform Now have two equations, two

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics

Likelihood inference in complex settings Nancy Reid with Uyen Hoang, Wei Lin, Ximing Xu 1 / 30

Numerical Computing COL 100 - Introduction to Computer Science Department of Computer Science and

In the name of Allah In the name of Allah the compassionate, the merciful Digital Video

Optimizing the Inner Loop of the Gravitational Force Interaction on Modern Processors Michael S.

Chapter 9:Advanced Programming Techniques A mixed bag of different methods to improve the

Solution of Nonlinear Equations: Graphical and Incremental Search Methods Mike Renfro September

Sequential Data Modeling - Conditional Random Fields Graham Neubig Nara Institute of Science and

The Development of a National Census of the Health Information Workforce: Expert Panel

Retention of women in Computer Science Vashti Galpin vashti@cs.wits.ac.za

Outline for today Computation of the likelihood function for GLMMs - PowerPoint PPT Presentation

Outline for today Computation of the likelihood function for GLMMs - Monte Carlo methods Monte Carlo methods Rasmus Waagepetersen Computation of the likelihood function using importance Department of Mathematics sampling Aalborg

Max. likelihood &amp; Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

Likelihood Methods of Inference Toss coin 6 times and get Heads twice. p is probability of getting

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Likelihood-Based Statistical Decisions Marco Cattaneo Seminar for Statistics ETH Z urich,

Likelihood Functions The likelihood function answers the question: What does the sensor tell about

Formal Definition of Computation Formal Definition of Computation p.1/28 Computation

1 Being Normal, Simultaneously Maximizing Likelihood with Uniform Now have two equations, two

15-388/688 - Practical Data Science: Maximum likelihood estimation, nave Bayes J. Zico Kolter

Maximum likelihood and EM algorithm (after the Chapter 8) Pasha Zusmanovich, deCODE Statistics

Likelihood inference in complex settings Nancy Reid with Uyen Hoang, Wei Lin, Ximing Xu 1 / 30

Numerical Computing COL 100 - Introduction to Computer Science Department of Computer Science and

In the name of Allah In the name of Allah the compassionate, the merciful Digital Video

Optimizing the Inner Loop of the Gravitational Force Interaction on Modern Processors Michael S.

Chapter 9:Advanced Programming Techniques A mixed bag of different methods to improve the

Solution of Nonlinear Equations: Graphical and Incremental Search Methods Mike Renfro September

Sequential Data Modeling - Conditional Random Fields Graham Neubig Nara Institute of Science and

The Development of a National Census of the Health Information Workforce: Expert Panel

Retention of women in Computer Science Vashti Galpin vashti@cs.wits.ac.za

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for