✬ ✩ An Extended Random-effects Approach to Analysing Repeated, Overdispersed Count Data Clarice G. B. Dem´ etrio ESALQ/USP, Piracicaba, SP, Brasil Clarice.demetrio@usp.br joint work with Geert Molenberghs, Hasselt University, Belgium Geert Verbeke, Katholieke Universiteit Leuven, Belgium VIII Encontro dos Alunos P´ os-gradua¸ c˜ ao em Estat´ ıstica e Experimenta¸ c˜ ao Agronˆ omica Piracicaba, SP November, 20, 2018 ✫ ✪ 1
✬ ✩ Outline • Motivating application - A Clinical Trial in Epileptic Patients • Generalized linear models • Poisson regression models • Overdispersion in GLM’s • Univariate overdispersed count data • Longitudinal overdispersed count data • Estimation • Discussion of the example • Final remarks ✫ ✪ 2
✬ ✩ Motivation - A Clinical Trial in Epileptic Patients • a randomized, double-blind, parallel group multi-center study for the comparison of placebo with a new anti-epileptic drug (AED) • after a 12-week baseline period, 45 epilepsy patients were assigned to the placebo group, 44 to the active (new) treatment group • patients measured weekly during 16 weeks (double-blind) and some up to 27 weeks in a long-term open-extension study • outcome of interest: the number of epileptic seizures experienced during the last week, i.e., since the last time the outcome was measured • key research question: whether or not the additional new treatment reduces the number of epileptic seizures ✫ ✪ 3
✬ ✩ Considerations about the data • a very skewed distribution, with the largest observed value equal to 73 seizures in week ✫ ✪ 4
✬ ✩ • unstable behavior explained by: – presence of extreme values, – very little observations available at some of the time-points, especially past week 20 • longitudinal count data: – discrete data – possible correlation between measurements for the same ✫ ✪ individual 5
✬ ✩ # Observations Week Placebo Treatment Total 1 45 44 89 5 42 42 84 10 41 40 81 15 40 38 78 16 40 37 77 17 18 17 35 20 2 8 10 27 0 3 3 • serious drop in number of measurements past the end of the actual double-blind period, i.e., past week 16 ✫ ✪ 6
✬ ✩ Generalized Linear Models (GLM’s) – unifying framework for much statistical modelling (Nelder and Wedderburn, 1972) – an extension to the standard normal theory linear model – three components: • independent random variables Y i , i = 1 , . . . , n , from a linear exponential family distribution with means µ i and constant scale parameter φ , φ − 1 [ yθ − ψ ( θ )] + c ( y, φ ) { } f ( y ) ≡ f ( y | θ, φ ) = exp , where µ = E [ Y ] = ψ ′ ( θ ) and Var ( Y ) = φψ ′′ ( θ ) . • a linear predictor vector η given by η = X β where β is a vector of p unknown parameters and X = [ x 1 , . . . , x n ] T , the design matrix; • a link function g ( · ) relating the mean to the linear predictor, i.e. g ( µ i ) = η i = x T i β ✫ ✪ 7
✬ ✩ Poisson regression models If Y i , i = 1 , . . . , n , are counts with means µ i , the standard Poisson model assumes that Y i ∼ Pois ( µ i ) with f ( y i ) = e − µ i µ y i i y i ! and E ( Y i ) = µ i and Var ( Y i ) = µ i (too restrictive!) The canonical link function is the log g ( µ i ) = log( µ i ) = η i and η i = x T i β . For a well fitting model (Hinde and Dem´ etrio, 1998a,b): ✫ ✪ Residual Deviance ≈ Residual d.f. 8
✬ ✩ Overdispersion in GLM’s What if Residual Deviance ≫ Residual d.f.? (i) Badly fitting model • omitted terms/variables • incorrect relationship (link) • outliers (ii) variation greater than predicted by model: ⇒ Overdispersion = • count data: Var ( Y ) > µ • counted proportion data: Var ( Y ) > mπ (1 − π ) ✫ ✪ 9
✬ ✩ Univariate Overdispersed Count Data Y i – counts with means λ i (Hinde and Dem´ etrio, 1998a,b) Negative Binomial Type Variance log λ i = x T Y i | λ i ∼ Pois ( λ i ) with i β E ( Y i | λ i ) = λ i Var ( Y i | λ i ) = λ i • no particular distributional form: E ( λ i ) = µ i and Var ( λ i ) = σ 2 i Var ( Y i ) = µ i + σ 2 E ( Y i ) = µ i i • λ i ∼ Γ( α, β i ) Var ( Y i ) = αβ i (1+ β i ) = µ i + µ 2 i E [ Y i ] = µ i = αβ i ( NegBinII ) α • λ i ∼ Γ( α i , β ) E [ Y i ] = µ i = α i β Var ( Y i ) = µ i (1 + β ) = φµ i ( NegBinI ) ✫ ✪ 10
✬ ✩ Poisson-normal model Individual level random effect in the linear predictor log λ i = x T Y i | b i ∼ Pois ( λ i ) with i β + b i where b i ∼ N (0 , d ) , which gives i β + 1 2 d := µ i e x T E [ Y i ] = i β + 1 2 d + e 2 x T i β + d ( e d − 1) = µ i + µ i ( e d − 1) µ i e x T Var ( Y i ) = i.e. a variance function of the form Var ( Y i ) = µ i + kµ 2 i ✫ ✪ 11
✬ ✩ Longitudinal Overdispersed Count Data Y ij : the j th outcome for subject i , i = 1 , . . . , N , j = 1 , . . . , n i Y i = ( Y i 1 , . . . , Y in i ) ′ : the vector of measurements for subject i Negative Binomial Type Variance extension Y ij | λ ij ∼ Poi ( λ ij ) , λ i = ( λ i 1 , . . . , λ in i ) ′ , with E ( λ i ) = µ i and Var ( λ i ) = Σ i Unconditionally, E ( Y i ) = µ i , and Var ( Y i ) = M i + Σ i where M i is a diagonal matrix with the vector µ i along the diagonal ✫ ✪ 12
✬ ✩ • the diagonal structure of M i reflects the conditional independence assumption – all dependence between measurements on the same unit stem from the random effects • components of λ i independent – pure overdispersion model, without correlation between the repeated measures • λ ij = λ i ⇒ Var ( Y i ) = M i + σ 2 i J n i – a Poisson version of compound symmetry • also possible to combine general correlation structures between the components of λ i ✫ ✪ 13
✬ ✩ Poisson-normal model extension – a GLMM Y ij | b i ∼ Poi ( λ ij ) , x ′ ij β + z ′ ln( λ ij ) = ij b i , ∼ N ( 0 , D ) b i x ij and z ij : p - and q -dimensional vectors of known covariate values β : a p -dimensional vector of unknown fixed regression coefficients Then, unconditionally, µ i = E ( Y i ) has components: ( ij β + 1 ) x ′ 2 z ′ µ ij = exp ij D z ij and the variance-covariance matrix is ( ) e Z i DZ ′ i − J n i Var ( Y i ) = M i + M i M i ✫ ✪ 14
✬ ✩ Models Combining Overdispersion With Normal Random Effects Y ij | θ ij , b i ∼ Poi ( λ ij ) ( ) x ′ ij β + z ′ = θ ij exp λ ij ij b i ∼ N ( 0 , D ) b i E ( θ i ) = E [( θ i 1 , . . . , θ in i ) ′ ] = Φ i Var ( θ i ) = Σ i Then, µ i = E ( Y i ) has components: ( ) ij β + 1 x ′ 2 z ′ µ ij = φ ij exp ij D z ij The variance-covariance matrix is Var ( Y i ) = M i + M i ( P i − J n i ) M i where the ( j, k ) th element of P i is ) σ i,jk + φ ij φ ik ( 1 ( 1 ) 2 z ′ 2 z ′ p i,jk = exp ij D z ik exp ik D z ij φ ij φ ik ✫ ✪ 15
✬ ✩ Estimation for the Poisson-normal and Combined Models • random-effects models fitted by maximization of the marginal likelihood, by integrating out the random effects from conditional densities • likelihood contribution of subject i is from: n i ∫ ∏ f i ( y i | β , D, φ ) = f ij ( y ij | b i , β , φ ) f ( b i | D ) d b i j =1 • likelihood for β , D , and φ : N n i ∫ ∏ ∏ L ( β , D, φ ) = f ij ( y ij | b i , β , φ ) f ( b i | D ) d b i . i =1 j =1 ✫ ✪ 16
✬ ✩ • key problem: presence of N integrals – in general no closed-form solution exists (Verbeke and Molenberghs, 2000; Molenberghs and Verbeke, 2005). To solve the problem, use of – numerical integration – SAS procedure NLMIXED – series expansion methods (penalized quasi-likelihood, marginal quasi-likelihood), Laplace approximation, etc – SAS procedure GLIMMIX – hybrid between analytic and numerical integration • in some special cases (linear mixed effects model, Poisson-normal model), these integrals can be worked out analytically – also true for the combined model • Fully Bayesian inferences ✫ ✪ 17
✬ ✩ Full Marginal Density for the Combined Model The joint probability of Y i takes the form: n i ( ) ( ) y ij + t j α j + y ij + t j − 1 y ij + t j ∑ ∏ ( − 1) t j β P ( Y i = y i ) = j α j − 1 y ij t j =1 n i ∑ ( y ij + t j ) x ′ × exp ij β j =1 n i n i 1 ∑ ∑ D ( y ij + t j ) z ′ × exp ( y ij + t j ) z ij ij 2 j =1 j =1 where t = ( t 1 , . . . , t n i ) ranges over all non-negative integer vectors – special cases can be obtained very easily – usefully used to implement maximum likelihood estimation, with numerical accuracy governed by the number of terms included in the series ✫ ✪ 18
✬ ✩ Partial marginalization – integrate over the gamma random effects only, leaving the normal random effects untouched The corresponding probability is: ( ) ( ) y ij ( ) α j α j + y ij − 1 1 β j κ y ij P ( Y ij = y ij | b i ) = ij 1 + κ ij β j 1 + κ ij β j α j − 1 where κ ij = exp[ x ′ ij β + z ′ ij b i ] – we assume that the gamma random effects are independent within a subject – the correlation is induced by the normal random effects – easy to obtain the fully marginalized probability by numerically integration the normal random effects out of P ( Y ij = y ij | b i ) , using ✫ SAS procedure NLMIXED ✪ 19
Recommend
More recommend