Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and Physics University of Queensland 1 / 28

Looking Back: Course Overview Generalized linear models (GLMs) • Building blocks systematic and random components, exponential familes • Prediction and parameter estimation • Specific models for different types of data continuous response, binary response, count response... • Modelling process and model diagnostics Extensions of GLMs • Quasi-likelihood models • Nonparametric models • Mixed models and marginal models Time series 2 / 28

Extending GLMs (a) (c) Quasi-likelihood Mixed/marginal GLMs models models (b) Nonparametric models (a) Relax assumption on the random component. (b) Relax assumption on the systematic component. (c) Relax assumption on the data (independence). 3 / 28

Recall Gamma regression • When Y is a non-negative continuous random variable, we can choose the systematic and random components as follows. E ( Y | x ) = exp( 𝛾 ⊤ x ) (systematic) (random) Y | x is Gamma distributed . • We further assume the variance of the Gamma distribution is 𝜈 2 /𝜉 ( 𝜉 treated as known), thus Y | x ∼ Γ( 𝜈 = exp( 𝛾 ⊤ x ) , var = 𝜈 2 /𝜉 ) , where Γ( 𝜈 = a , var = b ) denotes a Gamma distribution with mean a and variance b . We have seen how to estimate 𝛾 for Gamma regression. How do we estimate the dispersion parameter 𝜒 = 1 /𝜉 ? 4 / 28

Poisson regression • Poisson regression requires data variance to be the same as mean, but this is seldom the case in real data. • Overdispersion: variance in data is larger than expected based on the model. • Underdisperson: variance in data is smaller than expected based on the model. • For count data, we used quasi Poisson regression to allow both overdisperson and underdispersion. How is the quasi-Poisson model defined? How are the parameters estimated? 5 / 28

This Lecture • Estimation of dispersion parameter • Quasi-likelihood: derivation and parameter estimation 6 / 28

Estimation of Dispersion Parameter Recall: Fisher scoring for Gamma regression • Consider the Gamma regression model Y | x ∼ Γ( 𝜈 = exp( 𝛾 ⊤ x ) , var = 𝜈 2 /𝜉 ) , • Let 𝜈 i = exp( x ⊤ i 𝛾 ), then gradient and Fisher information are 𝜉 ( y i − 𝜈 i ) ∑︂ ∑︂ 𝜉 x ⊤ ∇ ℓ ( 𝛾 ) = x i , I ( 𝛾 ) = i x i , 𝜈 i i i • Fisher scoring updates 𝛾 to 𝛾 ′ = 𝛾 + I ( 𝛾 ) − 1 ∇ ℓ ( 𝛾 ) . Update of 𝛾 does not depend on the dispersion parameter 𝜒 = 1 /𝜉 ! 7 / 28

Moment estimator for the dispersion parameter • We first estimate 𝛾 with Fisher scoring. • Recall: if a GLM model with var( Y ) = 𝜒 V ( 𝜈 ) is correct, then X 2 𝜈 i ) 2 ( y i − ˆ ∑︂ ∼ 𝜓 2 𝜒 = n − p 𝜒 V (ˆ 𝜈 i ) i where X 2 is the generalized Pearson statistic, n is the number of examples, and p is the number of parameters in 𝛾 . • That is, we have E ( X 2 /𝜒 ) = n − p . • The gives us the moment estimator X 2 𝜈 i ) 2 1 ( y i − ˆ ˆ ∑︂ 𝜒 = n − p = n − p V (ˆ 𝜈 i ) i The formula can be used for any GLM with unknown 𝜒 ! 8 / 28

Example For Gamma regression, var( Y ) = 𝜒𝜈 2 , so V ( 𝜈 ) = 𝜈 2 . > fit.gam.inv = glm(time ~ lot * log(conc), data=clot, family=Gamma) (Dispersion parameter for Gamma family taken to be 0.002129707) > mu = predict(fit.gam.inv, type= ' response ' ) > sum((fit.gam.inv $ y - mu)**2 / mu**2) / (length(mu) - length(coef(fit.gam.inv))) [1] 0.002129692 Our estimate is consistent with the summary function. 9 / 28

Quasi-Likelihood Recall: Fisher scoring for GLM • Let 𝜈 i = E ( Y i | x i , 𝛾 ) = g ( x ⊤ i 𝛾 ) and V i = var( Y i | x i , 𝛾 ). • The gradient, or score function, is y i − 𝜈 i ∑︂ ∇ ℓ ( 𝛾 ) = x i . g ′ ( 𝜈 i ) V i i • The Fisher information is 1 ∑︂ x i x ⊤ I ( 𝛾 ) = i . g ′ ( 𝜈 i ) 2 V i i • Fisher scoring updates 𝛾 to 𝛾 ′ = 𝛾 + I − 1 ( 𝛾 ) ∇ ℓ ( 𝛾 ) . 10 / 28

• Fisher scoring for GLM can thus be written as )︄ − 1 (︄∑︂ (︄∑︂ )︄ 1 y i − 𝜈 i 𝛾 ′ = 𝛾 + x i x ⊤ x i . i g ′ ( 𝜈 i ) 2 V i g ′ ( 𝜈 i ) V i i i • We just need to know the link function g and the variances V i ’s. • In particular, if we know V i = 𝜒 V ( 𝜈 i ), then the update does not depend on 𝜒 . • Thus we can determine 𝛾 even if 𝜒 is unknown. 11 / 28

Quasi-model via Fisher scoring • A GLM has the following structure 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) Y | x follows an exponential family distribution . • A quasi-model relaxes the assumption on the random component 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) var( Y | x ) = 𝜒 V ( 𝜈 ) , where 𝜒 is a dispersion parameter, V ( 𝜈 ) is a variance function, and 𝛾 is determined using Fisher scoring! 12 / 28

Hi, I’m Quasimodo. 13 / 28

Quasi-model via quasi-likelihood • A quasi-model relaxes the assumption on the random component 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) var( Y | x ) = 𝜒 V ( 𝜈 ) , where 𝜒 is a dispersion parameter, V ( 𝜈 ) is a variance function, and 𝛾 is determined by maximizing quasi-likelihood! • Quasi-likelihood is a surrogate for the log-likelihood of the mean parameter 𝜈 given an observation y , when we only know var( Y | x ) = 𝜒 V ( 𝜈 ). 14 / 28

Construction of quasi-likelihood • Recall: a score function ℓ ( 𝜈 ) satisfies E ( ℓ ) = 0 , var( ℓ ) = − E ( ℓ ′ ) . Y − µ • Define S ( 𝜈 ) = φ V ( µ ) , then S ( 𝜈 ) is similar to a score function: E ( S ) = 0 , 1 var( S ) = − E S ′ = 𝜒 V ( 𝜈 ) . • S ( 𝜈 ) is thus called a quasi-score function. 15 / 28

• The usual log-likelihood is an integral of the score function. • By analogy, the quasi-likelihood (quasi log-likelihood) is ∫︂ µ y − t Q ( 𝜈 ; y ) = 𝜒 V ( t ) dt . y 16 / 28

Quasi-likelihood for some variance functions V ( µ ) Q ( µ ; y ) distribution constraint − ( y − µ ) 2 / 2 1 normal - µ y ln µ − µ Poisson µ > 0 , y ≥ 0 µ 2 − y /µ − ln µ Gamma µ > 0 , y ≥ 0 µ 3 − y / (2 µ 2 ) + 1 /µ inverse Gaussian µ > 0 , y ≥ 0 µ 2 µ − m (︂ )︂ µ y µ m 1 − m − - µ > 0 , m ̸ = 0 , 1 , 2 2 − m µ (1 − µ ) y ln 1 − µ + ln(1 − µ ) µ binomial µ ∈ (0 , 1) , 0 ≤ y ≤ 1 µ 2 (1 − µ 2 ) 1 − µ − y µ − 1 − y µ (2 y − 1) ln - µ ∈ (0 , 1) , 0 ≤ y ≤ 1 1 − µ µ + µ 2 / k k µ y ln k + µ + k ln negative binomial µ > 0 , y ≥ 0 k + µ 17 / 28

Parameter estimation for quasi-model • In a quasi-model, 𝜈 is a function of 𝛾 , and the quasi-likelihood is also a function of 𝛾 ∑︂ Q ( 𝛾 ) = Q ( 𝜈 i ( 𝛾 ); y i ) i • The Fisher scoring update for Q is given by 𝛾 ′ = 𝛾 + ( − E ∇ 2 Q ( 𝛾 )) − 1 ∇ Q ( 𝛾 ) )︄ − 1 (︄∑︂ (︄∑︂ )︄ 1 y i − 𝜈 i g ′ ( 𝜈 i ) 2 𝜒 V ( 𝜈 i ) x i x ⊤ = 𝛾 + g ′ ( 𝜈 i ) 𝜒 V ( 𝜈 i ) x i . i i i The update is independent of 𝜒 . X 2 • 𝜒 is estimated as ˆ 𝜒 = n − p after 𝛾 is estimated. 18 / 28

Recall: quasi-Poisson regression • Quasi-Poisson regression model introduces an additional dispersion paramemeter 𝜒 . • It replaces the original model variance V i on x i by 𝜒 V i . • 𝜒 > 1 is used to accommodate overdispersion relative to the original model. • 𝜒 < 1 is used to accommodate underdispersion relative to the original model. • 𝜒 is usually estimated separately after estimating 𝛾 . 19 / 28

Estimating 𝜒 in quasi-Poisson regression > fit.qpo <- glm(Days ~ Sex + Age + Eth + Lrn, data=quine, family=quasipoisson) (Dispersion parameter for quasipoisson family taken to be 13.16691) > mu = predict(fit.qpo, type= ' response ' ) > sum((fit.qpo $ y - mu)**2 / mu) / (length(mu) - length(coef(fit.qpo))) [1] 13.16684 20 / 28

Example Data Variety Site 1 2 3 4 5 6 7 8 9 10 Mean 1 0.05 0.00 0.00 0.10 0.25 0.05 0.50 1.30 1.50 1.50 0.52 2 0.00 0.05 0.05 0.30 0.75 0.30 3.00 7.50 1.00 12.70 2.56 3 1.25 1.25 2.50 16.60 2.50 2.50 0.00 20.00 37.50 26.25 11.03 4 2.50 0.50 0.01 3.00 2.50 0.01 25.00 55.00 5.00 40.00 13.35 5 5.50 1.00 6.00 1.10 2.50 8.00 16.50 29.50 20.00 43.50 13.36 6 1.00 5.00 5.00 5.00 5.00 5.00 10.00 5.00 50.00 75.00 16.60 7 5.00 0.10 5.00 5.00 50.00 10.00 50.00 25.00 50.00 75.00 27.51 8 5.00 10.00 5.00 5.00 25.00 75.00 50.00 75.00 75.00 75.00 40.00 9 17.50 25.00 42.50 50.00 37.50 95.00 62.50 95.00 95.00 95.00 61.50 Mean 4.20 4.77 7.34 9.57 14.00 21.76 24.17 34.81 37.22 49.33 20.72 • Incidence of leaf blotch on 10 varieties of barley grown at 9 sites. • The response is the percentage leaf area affected. 21 / 28

Heatmap for the data 9 8 7 proportion 6 75 site 5 50 4 25 0 3 2 1 1 2 3 4 5 6 7 8 9 10 variety 22 / 28

> fit.qbin = glm(proportions ~ as.factor(site) + as.factor(variety), family = quasibinomial) • A binomial model satisfies var( Y ) = 𝜈 (1 − 𝜈 ). • A quasibinomial model assumes that var( Y ) = 𝜒𝜈 (1 − 𝜈 ), where 𝜒 is the dispersion parameter. • The probability of having leaf blotch for variety j at site i has the form exp( b + 𝛽 i + 𝛾 j ) p ij = 1 + exp( b + 𝛽 i + 𝛾 j ) 23 / 28

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and Physics University of Queensland 1 / 28 Looking Back: Course Overview Generalized linear models (GLMs) Building blocks systematic and random components, exponential familes

Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang,

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Development of Taiwans Bunun Tribe Nan-An Tribe Natural Environment in Nan-An Location

CNBC Matlab Mini-Course Inf and NaN 3/0 returns Inf 0/0 returns NaN David S. Touretzky

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Quasi-Resonant Converters Introduction 20.1 The zero-current-switching quasi-resonant switch

Well quasi-ordering Aronszajn lines. Carlos Martinez-Ranero Centro de Ciencias Matematicas March

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Lecture 16. Mixed Models Nan Ye School of Mathematics and Physics University of Queensland 1 /

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-Cartan companions of cluster-tilted quivers Ahmet Seven Middle East Technical University,

Error Amplification in Code-based Cryptography Alexander Nilsson 1,2 Thomas Johansson 1 Paul

A pointwise ergodic theorem for quasi-pmp graphs Anush Tserunyan University of Illinois at

A Timeless Model for the Verification of Quasi-Periodic Distributed Systems Maryam Dabaghchian

Camera Networks Dimensioning and Scheduling with Quasi Worst-Case Transmission Time Viktor Edpalm

Extremality and dynamically defined measures Diophantine preliminaries First results Main

A domain theory for quasi-Borel spaces Ohad Kammar with Matthijs V ak ar and Sam Staton

On The Complexity Of Computing Grbner Bases For Quasi-Homogeneous Systems Jean-Charles Faugre

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and Physics University of Queensland 1 / 28 Looking Back: Course Overview Generalized linear models (GLMs) Building blocks systematic and random components, exponential familes

Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang, QCRI Big Data Cleaning Nan Tang,

Designs Chapter 11 Quasi-Experimentation Quasi-experiments resemble experiments, but lack

Quasi-maximum likelihood estimation for multivariate CARMA processes Eckhard Schlemm Institute

Development of Taiwans Bunun Tribe Nan-An Tribe Natural Environment in Nan-An Location

CNBC Matlab Mini-Course Inf and NaN 3/0 returns Inf 0/0 returns NaN David S. Touretzky

Max. likelihood &amp; Bayesian techniques are both likelihood-based. Weaknesses of likelihood for

Quasi-Resonant Converters Introduction 20.1 The zero-current-switching quasi-resonant switch

Well quasi-ordering Aronszajn lines. Carlos Martinez-Ranero Centro de Ciencias Matematicas March

Contents 1. General Problem 2. Quasi-primal algebras Logics associated with a quasi-primal

Maximum Likelihood properties Maximum parsimony Maximum likelihood Experimental design

Maximum likelihood models Tues. Feb. 27, 2018 1 Overview of today Informal notion of

Lecture 16. Mixed Models Nan Ye School of Mathematics and Physics University of Queensland 1 /

Chapter 8: Estimation In this chapter we will cover: 1. The likelihood and maximum likelihood

Lesson 3: Likelihood-based inference for POMP models Aaron A. King, Edward L. Ionides, Kidus

Applied Statistics Lecturer: Serena Arima Likelihood ML estimator Summaries ML properties LR

Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood

Quasi-Cartan companions of cluster-tilted quivers Ahmet Seven Middle East Technical University,

Error Amplification in Code-based Cryptography Alexander Nilsson 1,2 Thomas Johansson 1 Paul

A pointwise ergodic theorem for quasi-pmp graphs Anush Tserunyan University of Illinois at

A Timeless Model for the Verification of Quasi-Periodic Distributed Systems Maryam Dabaghchian

Camera Networks Dimensioning and Scheduling with Quasi Worst-Case Transmission Time Viktor Edpalm

Extremality and dynamically defined measures Diophantine preliminaries First results Main

A domain theory for quasi-Borel spaces Ohad Kammar with Matthijs V ak ar and Sam Staton

On The Complexity Of Computing Grbner Bases For Quasi-Homogeneous Systems Jean-Charles Faugre

Max. likelihood & Bayesian techniques are both likelihood-based. Weaknesses of likelihood for