lecture 12 quasi likelihood nan ye
play

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and Physics University of Queensland 1 / 28 Looking Back: Course Overview Generalized linear models (GLMs) Building blocks systematic and random components, exponential familes


  1. Lecture 12. Quasi-likelihood Nan Ye School of Mathematics and Physics University of Queensland 1 / 28

  2. Looking Back: Course Overview Generalized linear models (GLMs) • Building blocks systematic and random components, exponential familes • Prediction and parameter estimation • Specific models for different types of data continuous response, binary response, count response... • Modelling process and model diagnostics Extensions of GLMs • Quasi-likelihood models • Nonparametric models • Mixed models and marginal models Time series 2 / 28

  3. Extending GLMs (a) (c) Quasi-likelihood Mixed/marginal GLMs models models (b) Nonparametric models (a) Relax assumption on the random component. (b) Relax assumption on the systematic component. (c) Relax assumption on the data (independence). 3 / 28

  4. Recall Gamma regression • When Y is a non-negative continuous random variable, we can choose the systematic and random components as follows. E ( Y | x ) = exp( 𝛾 ⊤ x ) (systematic) (random) Y | x is Gamma distributed . • We further assume the variance of the Gamma distribution is 𝜈 2 /𝜉 ( 𝜉 treated as known), thus Y | x ∼ Γ( 𝜈 = exp( 𝛾 ⊤ x ) , var = 𝜈 2 /𝜉 ) , where Γ( 𝜈 = a , var = b ) denotes a Gamma distribution with mean a and variance b . We have seen how to estimate 𝛾 for Gamma regression. How do we estimate the dispersion parameter 𝜒 = 1 /𝜉 ? 4 / 28

  5. Poisson regression • Poisson regression requires data variance to be the same as mean, but this is seldom the case in real data. • Overdispersion: variance in data is larger than expected based on the model. • Underdisperson: variance in data is smaller than expected based on the model. • For count data, we used quasi Poisson regression to allow both overdisperson and underdispersion. How is the quasi-Poisson model defined? How are the parameters estimated? 5 / 28

  6. This Lecture • Estimation of dispersion parameter • Quasi-likelihood: derivation and parameter estimation 6 / 28

  7. Estimation of Dispersion Parameter Recall: Fisher scoring for Gamma regression • Consider the Gamma regression model Y | x ∼ Γ( 𝜈 = exp( 𝛾 ⊤ x ) , var = 𝜈 2 /𝜉 ) , • Let 𝜈 i = exp( x ⊤ i 𝛾 ), then gradient and Fisher information are 𝜉 ( y i − 𝜈 i ) ∑︂ ∑︂ 𝜉 x ⊤ ∇ ℓ ( 𝛾 ) = x i , I ( 𝛾 ) = i x i , 𝜈 i i i • Fisher scoring updates 𝛾 to 𝛾 ′ = 𝛾 + I ( 𝛾 ) − 1 ∇ ℓ ( 𝛾 ) . Update of 𝛾 does not depend on the dispersion parameter 𝜒 = 1 /𝜉 ! 7 / 28

  8. Moment estimator for the dispersion parameter • We first estimate 𝛾 with Fisher scoring. • Recall: if a GLM model with var( Y ) = 𝜒 V ( 𝜈 ) is correct, then X 2 𝜈 i ) 2 ( y i − ˆ ∑︂ ∼ 𝜓 2 𝜒 = n − p 𝜒 V (ˆ 𝜈 i ) i where X 2 is the generalized Pearson statistic, n is the number of examples, and p is the number of parameters in 𝛾 . • That is, we have E ( X 2 /𝜒 ) = n − p . • The gives us the moment estimator X 2 𝜈 i ) 2 1 ( y i − ˆ ˆ ∑︂ 𝜒 = n − p = n − p V (ˆ 𝜈 i ) i The formula can be used for any GLM with unknown 𝜒 ! 8 / 28

  9. Example For Gamma regression, var( Y ) = 𝜒𝜈 2 , so V ( 𝜈 ) = 𝜈 2 . > fit.gam.inv = glm(time ~ lot * log(conc), data=clot, family=Gamma) (Dispersion parameter for Gamma family taken to be 0.002129707) > mu = predict(fit.gam.inv, type= ' response ' ) > sum((fit.gam.inv $ y - mu)**2 / mu**2) / (length(mu) - length(coef(fit.gam.inv))) [1] 0.002129692 Our estimate is consistent with the summary function. 9 / 28

  10. Quasi-Likelihood Recall: Fisher scoring for GLM • Let 𝜈 i = E ( Y i | x i , 𝛾 ) = g ( x ⊤ i 𝛾 ) and V i = var( Y i | x i , 𝛾 ). • The gradient, or score function, is y i − 𝜈 i ∑︂ ∇ ℓ ( 𝛾 ) = x i . g ′ ( 𝜈 i ) V i i • The Fisher information is 1 ∑︂ x i x ⊤ I ( 𝛾 ) = i . g ′ ( 𝜈 i ) 2 V i i • Fisher scoring updates 𝛾 to 𝛾 ′ = 𝛾 + I − 1 ( 𝛾 ) ∇ ℓ ( 𝛾 ) . 10 / 28

  11. • Fisher scoring for GLM can thus be written as )︄ − 1 (︄∑︂ (︄∑︂ )︄ 1 y i − 𝜈 i 𝛾 ′ = 𝛾 + x i x ⊤ x i . i g ′ ( 𝜈 i ) 2 V i g ′ ( 𝜈 i ) V i i i • We just need to know the link function g and the variances V i ’s. • In particular, if we know V i = 𝜒 V ( 𝜈 i ), then the update does not depend on 𝜒 . • Thus we can determine 𝛾 even if 𝜒 is unknown. 11 / 28

  12. Quasi-model via Fisher scoring • A GLM has the following structure 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) Y | x follows an exponential family distribution . • A quasi-model relaxes the assumption on the random component 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) var( Y | x ) = 𝜒 V ( 𝜈 ) , where 𝜒 is a dispersion parameter, V ( 𝜈 ) is a variance function, and 𝛾 is determined using Fisher scoring! 12 / 28

  13. Hi, I’m Quasimodo. 13 / 28

  14. Quasi-model via quasi-likelihood • A quasi-model relaxes the assumption on the random component 𝜈 = E ( Y | x ) = h ( 𝛾 ⊤ x ) , (systematic) (random) var( Y | x ) = 𝜒 V ( 𝜈 ) , where 𝜒 is a dispersion parameter, V ( 𝜈 ) is a variance function, and 𝛾 is determined by maximizing quasi-likelihood! • Quasi-likelihood is a surrogate for the log-likelihood of the mean parameter 𝜈 given an observation y , when we only know var( Y | x ) = 𝜒 V ( 𝜈 ). 14 / 28

  15. Construction of quasi-likelihood • Recall: a score function ℓ ( 𝜈 ) satisfies E ( ℓ ) = 0 , var( ℓ ) = − E ( ℓ ′ ) . Y − µ • Define S ( 𝜈 ) = φ V ( µ ) , then S ( 𝜈 ) is similar to a score function: E ( S ) = 0 , 1 var( S ) = − E S ′ = 𝜒 V ( 𝜈 ) . • S ( 𝜈 ) is thus called a quasi-score function. 15 / 28

  16. • The usual log-likelihood is an integral of the score function. • By analogy, the quasi-likelihood (quasi log-likelihood) is ∫︂ µ y − t Q ( 𝜈 ; y ) = 𝜒 V ( t ) dt . y 16 / 28

  17. Quasi-likelihood for some variance functions V ( µ ) Q ( µ ; y ) distribution constraint − ( y − µ ) 2 / 2 1 normal - µ y ln µ − µ Poisson µ > 0 , y ≥ 0 µ 2 − y /µ − ln µ Gamma µ > 0 , y ≥ 0 µ 3 − y / (2 µ 2 ) + 1 /µ inverse Gaussian µ > 0 , y ≥ 0 µ 2 µ − m (︂ )︂ µ y µ m 1 − m − - µ > 0 , m ̸ = 0 , 1 , 2 2 − m µ (1 − µ ) y ln 1 − µ + ln(1 − µ ) µ binomial µ ∈ (0 , 1) , 0 ≤ y ≤ 1 µ 2 (1 − µ 2 ) 1 − µ − y µ − 1 − y µ (2 y − 1) ln - µ ∈ (0 , 1) , 0 ≤ y ≤ 1 1 − µ µ + µ 2 / k k µ y ln k + µ + k ln negative binomial µ > 0 , y ≥ 0 k + µ 17 / 28

  18. Parameter estimation for quasi-model • In a quasi-model, 𝜈 is a function of 𝛾 , and the quasi-likelihood is also a function of 𝛾 ∑︂ Q ( 𝛾 ) = Q ( 𝜈 i ( 𝛾 ); y i ) i • The Fisher scoring update for Q is given by 𝛾 ′ = 𝛾 + ( − E ∇ 2 Q ( 𝛾 )) − 1 ∇ Q ( 𝛾 ) )︄ − 1 (︄∑︂ (︄∑︂ )︄ 1 y i − 𝜈 i g ′ ( 𝜈 i ) 2 𝜒 V ( 𝜈 i ) x i x ⊤ = 𝛾 + g ′ ( 𝜈 i ) 𝜒 V ( 𝜈 i ) x i . i i i The update is independent of 𝜒 . X 2 • 𝜒 is estimated as ˆ 𝜒 = n − p after 𝛾 is estimated. 18 / 28

  19. Recall: quasi-Poisson regression • Quasi-Poisson regression model introduces an additional dispersion paramemeter 𝜒 . • It replaces the original model variance V i on x i by 𝜒 V i . • 𝜒 > 1 is used to accommodate overdispersion relative to the original model. • 𝜒 < 1 is used to accommodate underdispersion relative to the original model. • 𝜒 is usually estimated separately after estimating 𝛾 . 19 / 28

  20. Estimating 𝜒 in quasi-Poisson regression > fit.qpo <- glm(Days ~ Sex + Age + Eth + Lrn, data=quine, family=quasipoisson) (Dispersion parameter for quasipoisson family taken to be 13.16691) > mu = predict(fit.qpo, type= ' response ' ) > sum((fit.qpo $ y - mu)**2 / mu) / (length(mu) - length(coef(fit.qpo))) [1] 13.16684 20 / 28

  21. Example Data Variety Site 1 2 3 4 5 6 7 8 9 10 Mean 1 0.05 0.00 0.00 0.10 0.25 0.05 0.50 1.30 1.50 1.50 0.52 2 0.00 0.05 0.05 0.30 0.75 0.30 3.00 7.50 1.00 12.70 2.56 3 1.25 1.25 2.50 16.60 2.50 2.50 0.00 20.00 37.50 26.25 11.03 4 2.50 0.50 0.01 3.00 2.50 0.01 25.00 55.00 5.00 40.00 13.35 5 5.50 1.00 6.00 1.10 2.50 8.00 16.50 29.50 20.00 43.50 13.36 6 1.00 5.00 5.00 5.00 5.00 5.00 10.00 5.00 50.00 75.00 16.60 7 5.00 0.10 5.00 5.00 50.00 10.00 50.00 25.00 50.00 75.00 27.51 8 5.00 10.00 5.00 5.00 25.00 75.00 50.00 75.00 75.00 75.00 40.00 9 17.50 25.00 42.50 50.00 37.50 95.00 62.50 95.00 95.00 95.00 61.50 Mean 4.20 4.77 7.34 9.57 14.00 21.76 24.17 34.81 37.22 49.33 20.72 • Incidence of leaf blotch on 10 varieties of barley grown at 9 sites. • The response is the percentage leaf area affected. 21 / 28

  22. Heatmap for the data 9 8 7 proportion 6 75 site 5 50 4 25 0 3 2 1 1 2 3 4 5 6 7 8 9 10 variety 22 / 28

  23. > fit.qbin = glm(proportions ~ as.factor(site) + as.factor(variety), family = quasibinomial) • A binomial model satisfies var( Y ) = 𝜈 (1 − 𝜈 ). • A quasibinomial model assumes that var( Y ) = 𝜒𝜈 (1 − 𝜈 ), where 𝜒 is the dispersion parameter. • The probability of having leaf blotch for variety j at site i has the form exp( b + 𝛽 i + 𝛾 j ) p ij = 1 + exp( b + 𝛽 i + 𝛾 j ) 23 / 28

Recommend


More recommend