Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / Transcribed by Bing Miu and Yukun Li Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.
Likelihood Equations for GLMs ◮ The GLM likelihood function is given as follows: � ⇀ L ( β ) = log ( f ( y i | θ i , φ )) i � y i θ i − b ( θ i ) � � = + C ( y i , φ ) a ( φ ) i � � y i θ i − b ( θ i ) = + C ( y i , φ ) a ( φ ) i i ◮ φ is a dispersion parameter. Not indexed by i , assumed to be fixed ◮ θ i contains β , from η i ◮ C ( y i , φ ) is from the random component.
Score Equations ◮ Taking the derivative of the log likelihood function, set it equal to 0 ⇀ � ∂ L ( β ) ∂ L i = = 0 , ∀ j ∂β j ∂β j i ∂θ i = ( y i − µ i ) ◮ Since ∂ L i a ( φ ) , µ i = b ′ ( θ i ), Var ( Y i ) = b ′′ ( θ i ) a ( φ ), and η i = � j β j x ij � � ∂ L i y i − µ i a ( φ ) ∂µ i 0 = = x ij ∂β j a ( φ ) Var ( Y i ) ∂η i i i � ( y i − µ i ) x ij ∂µ i = Var ( Y i ) ∂η i i ◮ V ( θ ) = b ′′ ( θ ), b ′′ ( θ ) is the variance function of the GLM. ◮ µ i = E [ Y i | x i ] = g − 1 ( X i β ). These functions are typically non-linear with respect to β ’s, thus require iterative computation solutions.
Example: Score Equation from Binomial GLM (Ch5.5.1) Y~ Binomial ( n i , π i ) ◮ The joint probability mass function: N � π ( x i ) y i [1 − π ( x i )] n i − y i i =1 ◮ The log likelihood: � � � � � � �� � � L ( β ) = y i x ij β j − n i log 1 + exp β j x ij j i i j ◮ The score equation: ⇀ � e X i β ∂ L ( β ) = ( y i − n i ˆ π i ) x ij note that ˆ π i = ∂β j 1 + e X i β i .
Asymptotic Covariance of ˆ β : ◮ The likelihood function determines the asymptotic covariance of the ML estimate for ˆ β . ◮ Given the information matrix, I with hj elements: ⇀ � − ∂ 2 L ( � � ∂µ i � 2 N � β ) x ih x ij I = E = ∂β h β j Var ( Y i ) ∂η i i =1 where w i denotes � ∂µ i � 2 1 w i = Var ( Y i ) ∂η i
Asymptotic Covariance Matrix of ˆ β : ◮ The information matrix, I is equivalent to: I = � N i =1 x ih x ij w i = X T WX ◮ W is a diagonal matrix with w i as the diagonal element. In β MLE and depdent on the link practice, W is evulated at ˆ function ◮ The square root of the main diagonal elements of ( X T WX ) − 1 are estimated standard errors of ˆ β
Analogous to SLR SLR GLM the i th main diagnal σ 2 Var ( ˆ ˆ β i ) � N x ) 2 i =1 ( x i − ¯ element of ( X T WX ) − 1 σ 2 ( X T X ) − 1 ( X T WX ) − 1 Cov ( ˆ β i ) ˆ
Residual and Diagnostics ◮ Deviance Tests ◮ Measure of goodness of fit in GLM based on likelihood ◮ Most useful as a comparison between models (used as a screening method to identify important covariates) ◮ Use the saturated model as a baseline for comparison with other model fits ◮ For Poisson or binomial GLM: D = − 2[ L (ˆ µ | y ) − L ( y | y )]. ◮ Example of Deviance Model D(( y , ˆ µ ) ) � ( Y i − ˆ µ i ) 2 Gaussian 2 � ( y i ln( y i Poisson µ i ) − ( y i − ˆ µ i )) ˆ 2 � ( y i ln ( y i µ i )+( n i − y i ) ln ( n i − y i Bionomial µ i )) ˆ n i − ˆ
Deviance tests for nested models ◮ Consider two models, M 0 with fitted values ˆ µ 0 and M 1 with fitted values ˆ µ 1 : ◮ M 0 is nested within M 1 η µ 1 1 = β 0 + β 1 X 11 + β 2 X 12 η µ 0 0 = β 0 + β 1 X 11 ◮ Simpler models have smaller log likelihood and larger deviance: L (ˆ µ 0 | y ) ≤ L (ˆ µ 1 | y ) and D ( y | ˆ µ 1 ) ≤ D ( y | ˆ µ 0 ). ◮ The likelihood-ratio statistic comparing the two models is the difference between the deviances. − 2[ L (ˆ µ 0 | y ) − L (ˆ µ 1 | y )] = − 2[ L (ˆ µ 0 | y ) − L ( y | y )] − {− 2[ L (ˆ µ 1 | y ) − L ( y | y )] } = D ( y | ˆ µ 0 ) − D ( y | ˆ µ 1 )
Hypothesis test with differences in Deviance ◮ H 0 : β i 1 = ... = β ij = 0, fit a full and reduced model ◮ Hypothesis test with difference in deviance as test statistics. df is the number of parameter different between µ 1 and µ 0 µ 1 ) ∼ χ 2 D ( y | ˆ µ 0 ) − D ( y | ˆ df ◮ Reject H 0 if the the chi-square calculated value is larger than χ 2 df , 1 − α , where df is the number of parameters difference between µ 0 and µ 1 .
Residual Examinations ◮ Pearson residuals : e p y − ˆ µ i √ µ i ) , where µ i = g − 1 ( η i ) = g − 1 ( X i β ) i = V (ˆ ◮ Deviance residuals : µ i ) √ d i , where d i is the deviance contribution of e d i = sign ( y i − ˆ � 1 x > 0 i th obs. and sign ( x ) = − 1 x ≤ 0 ◮ Standardized residuals: e i y − ˆ µ i µ i ) , � √ r i = � h i ) , where e i = h 1 is the measure of (1 − � V (ˆ leverage, and r i ∼ = N (0 , 1)
Residual Plot Problem: Residual plot is hard to interpret for logistic regression 2 1 Residuals 0 −1 −2 −3 −2 −1 0 1 2 3 Expected Values
Binned Residual Plot ◮ Group observations into ordered groups (by x j , ˆ y or x ij ), with equal number of observations per group. ◮ Compute group-wise average for raw residuals ◮ Plot the average residuals vs predicted value. Each dot represent a group. Average Residuals 0.4 0.0 −0.6 −2 −1 0 1 2 Expected Values
Binned Residual Plot (Part 2) ◮ Red lines indicate ± 2 standard-error bounds, within which one would expect about 95% of the binned residuals to fall. ◮ R function avaiable. linrary (arm) binnedplot (x ,y, nclass...) # x <- Expected values. # y <- Residuals values. # nclass <- Number of bins. Average Residuals 0.2 −0.6 −2 −1 0 1 2 Expected Values
Binned Residual Plot (Part 3) ◮ In practice may need to fiddle with the number of observations per group. Default will take the value of nclass according to the n such that: – if n ≥ 100, nclass = floor ( sqrt ( length ( x ))); – if 10 < n < 100, nclass = 10; – if n < 10, nclass = floor ( n / 2).
Ex: Binned Residual Plot with different bin sizes bin size = 10 bin size = 50 0.4 0.2 Average Residuals Average Residuals 0.2 0.1 0.0 0.0 −0.2 −0.2 −3 −2 −1 0 1 2 −4 −2 0 2 4 Expected Values Expected Values bin size = 100 bin size = 500 0.4 0.4 Average Residuals Average Residuals 0.2 0.2 0.0 0.0 −0.2 −0.2 −0.4 −4 −2 0 2 4 −4 −2 0 2 4 Expected Values Expected Values
Recommend
More recommend