analysis of variance and regression may 13 2008 repeated
play

Analysis of variance and regression May 13, 2008 Repeated - PowerPoint PPT Presentation

Analysis of variance and regression May 13, 2008 Repeated measurements over time Presentation of data Traditional ways of analysis Variance component model (the dogs revisited) Random regression Baseline considerations Lene


  1. Analysis of variance and regression May 13, 2008

  2. Repeated measurements over time • Presentation of data • Traditional ways of analysis • Variance component model (the dogs revisited) • Random regression • Baseline considerations

  3. Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~pd/regression08_1

  4. 1 Repeated measurements, May 2008 Traditional presentation of longitudinal data: Ex: Aspirin absorption for healthy and ill subjects ( Matthews et.al.,1990 ) Comparison of groups for each time: • mass significance problem • tests are not independent • interpretation may be difficult

  5. 2 Repeated measurements, May 2008 What is the purpose of the investigation? • Description of time course • Comparison of groups – in which respect? level, trend,... overall pattern

  6. 3 Repeated measurements, May 2008 Why is this difficult? – or at least different from usual analyses • We have several measurements on each individual – traditional independence assumption is violated – repeated observations on the same individual are correlated (look alike) – ignoring this correlation may lead to bias , wrong standard error and therefore potentially misleading conclusions • Time course may be quite irregular, with no obvious structure, to be treated as a class-variable (using many parameters) in ANOVA-type models ( variance component models ) • Time course may vary between individuals Random regression

  7. 4 Repeated measurements, May 2008 Notation from multi-level models: level unit covariate 1 single observations time effects 2 individuals treatment effects If we fail to take this correlation into account, we will experience: • possible bias in the mean value structure • low efficiency (type 2 error) for evaluation of level 1 covariates (time-related effects) • too small standard errors (type 1 error) for estimates of level 2 effects (treatments)

  8. 5 Repeated measurements, May 2008 Possible bias? Individual time courses Average curve sometimes referred to as the healthy worker effect

  9. 6 Repeated measurements, May 2008 Missing values • MCAR Missing completely at random • MAR Missing at random - may depend on past observations • NR Informative missing (non-random) - depends on the missing value itself

  10. 7 Repeated measurements, May 2008 Level 1 covariates (unit: single observations), i.e. • Time itself • Covariates varying with time: blood pressure, heart rate, age If correlation is not taken into account, we ignore the paired situation, leading to low efficiency , i.e. too large P-values (type 2 error) Effects may go undetected!

  11. 8 Repeated measurements, May 2008 Level 2 covariates (unit: individuals), i.e. • Treatment • Gender, age If correlation is ignored, we act as if we have (a lot) more information than we actually have, leading to too small P-values (type 1 error) ’Noise’ may be taken to be real effects!

  12. 9 Repeated measurements, May 2008 Average curves may hide important structures! • They give no indication of the variation in the time profiles • Comparisons between groups should not be performed for each time point separately • Comparisons between time points cannot be judged from the curves (they are paired)

  13. 10 Repeated measurements, May 2008 The model must describe the characteristic differences between individuals, and the rest (noise, error) should be of an unsystematic, random nature. • Do not average over individual profiles, unless these have identical shapes, i.e. only shifts in level are seen between individuals. • Alternative: Calculate individual characteristics

  14. 11 Repeated measurements, May 2008 Individual time profiles (spaghettiogram) - divided into groups Do we see time profiles of identical shape? Are the averages representative?

  15. 12 Repeated measurements, May 2008 Commonly used characteristics • The response for selected times, e.g. endpoint • Average over a specific period of time • The slope, perhaps for a specific period • Peak value • Time to peak • The area under the curve (AUC). • A measure of cyclic behaviour. These are analysed as new observations .

  16. 13 Repeated measurements, May 2008 Ex: Aspirin • time to peak • peak value Conclusion: P=0.02 for identity of peak values. Quantifications!

  17. 14 Repeated measurements, May 2008 Example : 2 groups of dogs (5 resp. 6 dogs). Average profiles: of osmolality, measured 4 times (including treatments along the way)

  18. 15 Repeated measurements, May 2008 Do we have ’identical’ repetitions (except for level)?

  19. 16 Repeated measurements, May 2008 Model control Residual plot for 2-way ANOVA in (dog, treatment) We see a clear trumpet shape , because dogs with a high level also vary more than dogs with a low level. Multiplicative structure Solution: Make a logarithmic transformation!

  20. 17 Repeated measurements, May 2008 Profiles on logarithmic scale, with corresponding residual plot:

  21. 18 Repeated measurements, May 2008 Multilevel model structure: level/niveau 1 2 unit single measurements individuals variation within individuals between individuals σ 2 ω 2 W B covariates x z time , grp*time grp Multilevel models are part of the broader class of models: variance component models (which are not necessarily hierarchical)

  22. 19 Repeated measurements, May 2008 Two-level model : • Observations Y gdt (group, dog, time) • Random dog-level, Var( a gd ) = ω 2 B • Residual variation, within dogs, Var( ε gdt ) = σ 2 W • Systematic effect of time and grp proc mixed data=dog; class grp time_no dog; model losmol=grp time_no grp*time_no / ddfm=satterth; random dog(grp); run;

  23. 20 Repeated measurements, May 2008 This model assumes the socalled compound symmetry , i.e. that all measurements on the same individual are equally correlated : ω 2 B Corr( Y gdt 1 , Y gdt 2 ) = ρ = ω 2 B + σ 2 W This means that the distance in time is not taken into account!!

  24. 21 Repeated measurements, May 2008 Two-level model with random dog level: Class Levels Values grp 2 1 2 time_no 4 1 2 3 4 dog 11 1 2 3 4 5 6 7 8 9 10 11 P=0.08 for test of Covariance Parameter Estimates interaction, Standard Z Cov Parm Estimate Error Value Pr Z i.e. no convincing dog(grp) 0.06587 0.03532 1.86 0.0311 indication of this. Residual 0.03554 0.009672 3.67 0.0001 Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F grp 1 9 2.85 0.1257 time_no 3 27 21.35 <.0001 grp*time_no 3 27 2.50 0.0805

  25. 22 Repeated measurements, May 2008 Factor diagram: ✲ [ Dog ] Grp ✟ ✯ ✟✟✟ ✸ ✑ ✑✑✑✑ [ I ] = [ Dog ∗ Time ] ❍❍❍ ❥ ❍ ✲ Grp ∗ Time Time We have used the notation [ ] for the random effects, corresponding to variance components. We may note the following: • The effect of Grp*Time is evaluated against Dog*Time • If Grp*Time is not considered significant, we thereafter evaluate – Time against Dog*Time – Grp against Dog(Grp)

  26. 23 Repeated measurements, May 2008 The variance component model with random dog level specifies the covariance structure : 0 ω 2 B + σ 2 ω 2 ω 2 ω 2 1 0 1 1 ρ ρ ρ W B B B ω 2 ω 2 B + σ 2 ω 2 ω 2 B C B C ρ 1 ρ ρ = ( ω 2 B + σ 2 B B W B B C B C W ) B C B C ω 2 ω 2 ω 2 B + σ 2 ω 2 1 ρ ρ ρ B C B C B B W B @ A @ A ω 2 ω 2 ω 2 ω 2 B + σ 2 1 ρ ρ ρ B B B W called the compound symmetry structure. The correlation ρ is here estimated to ω 2 B ρ = Corr( Y gdt 1 , Y gdt 2 ) = ω 2 B + σ 2 W 0 . 06587 0 . 06587 + 0 . 03554 = 0 . 65 ≈

  27. 24 Repeated measurements, May 2008 Note, that the specification ’random dog(grp);’ can be written in two other ways: random intercept / subject=dog(grp); repeated time / type=CS subject=dog(grp); In the following, we shall see generalisations of the constructions above.

  28. 25 Repeated measurements, May 2008 Compound symmetry analysis proc mixed data=dog; Covariance Parameter Estimates class grp time dog; Cov Parm Subject Estimate model losmol=grp time grp*time CS dog(grp) 0.06587 / ddfm=satterth; Residual 0.03554 repeated time / type=cs Fit Statistics subject=dog(grp) rcorr; run; -2 Res Log Likelihood 14.8 AIC (smaller is better) 18.8 Estimated R Correlation Matrix for dog(grp) 1 1 Type 3 Tests of Fixed Effects Row Col1 Col2 Col3 Col4 Num Den Effect DF DF F Value Pr > F 1 1.0000 0.6496 0.6496 0.6496 2 0.6496 1.0000 0.6496 0.6496 grp 1 9 2.85 0.1257 3 0.6496 0.6496 1.0000 0.6496 time_no 3 27 21.35 <.0001 4 0.6496 0.6496 0.6496 1.0000 grp*time_no 3 27 2.50 0.0805

  29. 26 Repeated measurements, May 2008 The option ddfm=satterth (- or kenwardrogers ): • When the distributions are exact, they have no effect – in balanced situations • When approximations are necessary, these are considered best – in unbalanced situations, i.e for almost all observational designs – in case of missing observations • It may give rise to fractional degrees of freedom • The computations may require a little more time, but in most cases this will not be noticable • When in doubt, use it!

Recommend


More recommend