analysis of variance and regression december 4 2007
play

Analysis of variance and regression December 4, 2007 Variance - PowerPoint PPT Presentation

Analysis of variance and regression December 4, 2007 Variance component models Variance components One-way anova with random variation estimation interpretations Two-way anova with random variation Crossed random effects


  1. Analysis of variance and regression December 4, 2007

  2. Variance component models • Variance components • One-way anova with random variation – estimation – interpretations • Two-way anova with random variation • Crossed random effects • Ecological analyses

  3. Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2

  4. 1 Variance Component Models, December 2007 Terminology for correlated measurements: • Multivariate outcome : Several outcomes (responses) for each individual, e.g. a number of hormone measurements that we want to study simultaneously. • Cluster design : Same outcome (response) measured on all individuals in a number of families/villages/school classes • Repeated measurements: Same outcome (response) measured in different situations (or at different spots) for the same individual. • Longitudinal measurements: Same outcome (response) measured consecutively over time for each individual.

  5. 2 Variance Component Models, December 2007 Variance component models Generalisations of ANOVA-type models or regression models, involving several sources of random variation (variance components) • environmental variation – between regions, hospitals or countries • biological variation – variation between individuals, families or animals • within-individual variation – variation between arms, teeth, injection sites, days • variation due to uncontrollable circumstances – time of day, temperature, observer • measurement error

  6. 3 Variance Component Models, December 2007 Typical studies involve data from: • a number of family members from a sample of households • pupils from a sample of school classes • measurements on several spots of each individual Alternative name (for some of them): Multilevel models • variation on each level (variance component) • possibly systematic effects (covariates) on each level

  7. 4 Variance Component Models, December 2007 Examples of hierarchies: individual context/cluster → → level 1 level 2 level 3 → → subjects twin pairs countries → → subjects families regions → → students classes schools → → visits subjects centres → →

  8. 5 Variance Component Models, December 2007 Merits • Certain effects may be estimated more precisely, since some sources of variation are eliminated, e.g. by making comparisons within a family. This is analogous to the paired comparison situation. • When planning subsequent investigations, the knowledge of the relative sizes of the variance components will be of help in deciding the number of repetitions needed at each level (if possible).

  9. 6 Variance Component Models, December 2007 Drawbacks • When making inference (estimation and testing), it is important to take all sources of variation into account, and effects have to be evaluated using the relevant variation! • Bias may result, if one or more sources of variation are disregarded

  10. 7 Variance Component Models, December 2007 Measurements ’ belonging together ’ in the same cluster look alike (are correlated ) If we fail to take this correlation into account, we will experience: • possible bias in the mean value structure • low efficiency (type 2 error) for evaluation of level 1 covariates (within-cluster effects) • too small standard errors (type 1 error) for estimates of level 2 effects (between-cluster effects)

  11. 8 Variance Component Models, December 2007 Concepts of the day: • advantage/necessity of random effects • generalisations of ANOVA-type models Examples with small data sets • some of them too small to allow for trustworthy interpretations • illustrative precisely because of their limited size Illustrated with SAS PROC MIXED

  12. 9 Variance Component Models, December 2007 One-way analysis of variance – with random variation: Comparison of k ’ groups/clusters ’, satisfying • The groups are not of individual interest and it is of no interest to test whether they have identical means • The groups may be thought of as representatives from a population , that we want to describe. Example: 10 consecutive measurements of blood pressure on a sample of 50 women: • We ’know’ that the women differ – and we do not care! • We only want to learn something about blood pressure in the female population in general

  13. 10 Variance Component Models, December 2007 Example of one-way anova structure: 6 rabbits are vaccinated, each in 6 spots on the back Response Y : swelling in cm 2 Model: swelling = ’grand mean’ + ’rabbit deviation’ +’variation’ ε rs ∼ N (0 , σ 2 ), where y rs = µ + α r + ε rs , r = 1 , · · · , R = 6 denotes the rabbit , s = 1 , · · · , S = 6 denotes the spot The variation can be regarded either as ’within-rabbit variation’ or ’measurement error’ (probably a combination of the two).

  14. 11 Variance Component Models, December 2007 Rabbit means: µ r = µ + α r

  15. 12 Variance Component Models, December 2007 anova -table: SS df MS=SS/df F Between 12 . 8333 R − 1 = 5 2 . 5667 4 . 39 Within 17 . 5266 R ( S − 1) = 30 0 . 5842 Total 30 . 3599 RS − 1 = 35 0 . 8674 Test for identical rabbits means: F = 4 . 39 ∼ F (5 , 30) , P = 0 . 004, But: We are not interested in these particular 6 rabbits, only in rabbits in general, as a species ! We assume these 6 rabbits to have been randomly selected from the species.

  16. 13 Variance Component Models, December 2007 We choose to model rabbit variation instead of rabbit levels: swelling = ’grand mean’ +’between-rabbit variation’ +’within-rabbit variation’ y rs = µ + a r + ε rs , where the a r ’s and the ε rs ’s are assumed independent , normally distributed with Var( a r )= ω 2 Var( ε rs )= σ 2 B , W The variation between rabbits has been made random ω 2 B and σ 2 W are variance components , and the model is also called a two-level model

  17. 14 Variance Component Models, December 2007 Fixed vs. random effects? • Fixed: – all values of the factor present (typically only a few, e.g. treatment) – allows inference for these particular factor values only – must include a reasonable number of observations for each factor value • Random: – a representative sample of values of the factor is present – allows inference to be extended beyond the values in the experiment and to the population of possible factor values (e.g. geographical areas, classes, rabbits) – is necessary when we have a covariate for this level

  18. 15 Variance Component Models, December 2007 Interpretation : All observations have common mean and variance: y rs ∼ N ( µ, ω 2 B + σ 2 W ) but: Measurements made on the same rabbit are correlated with the intra-class correlation ω 2 B Corr( y r 1 , y r 2 ) = ρ = ω 2 B + σ 2 W Measurements made on the same rabbit tend to look more alike than measurements made on different rabbits. All measurements on the same rabbit look equally much alike . This correlation structure is called compound symmetry (CS) or exchangeability .

  19. 16 Variance Component Models, December 2007 Estimation of variance components First step is to determine the mean values of the mean squares (in balanced situations): E(MS B ) = R ω 2 B + σ 2 W E(MS W ) = σ 2 W and from this we get the estimates σ 2 ˜ W = MS W B = MS B − MS W σ 2 ˜ R

  20. 17 Variance Component Models, December 2007 Note: σ 2 It may happen that ˜ B becomes negative! • by a coincidence • as a result of competition between units belonging together, e.g. when measuring yield for plants grown in the same pot In this case, it will be reported as a zero

  21. 18 Variance Component Models, December 2007 Reading in data in SAS: data rabbit_orig; input spot $ y1-y6; data rabbit; cards; set rabbit_orig; a 7.9 8.7 7.4 7.4 7.1 8.2 rabbit=1; swelling=y1; output; b 6.1 8.2 7.7 7.1 8.1 5.9 rabbit=2; swelling=y2; output; c 7.5 8.1 6.0 6.4 6.2 7.5 rabbit=3; swelling=y3; output; d 6.9 8.5 6.8 7.7 8.5 8.5 rabbit=4; swelling=y4; output; e 6.7 9.9 7.3 6.4 6.4 7.3 rabbit=5; swelling=y5; output; f 7.3 8.3 7.3 5.8 6.4 7.7 rabbit=6; swelling=y6; output; ; run; run;

  22. 19 Variance Component Models, December 2007 In SAS, the estimation can be performed as: proc mixed data=rabbit; class rabbit; model swelling = / s; random rabbit; run; Covariance Parameter Estimates Cov Parm Estimate rabbit 0.3304 Residual 0.5842 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 7.3667 0.2670 5 27.59 <.0001

  23. 20 Variance Component Models, December 2007 Interpretation of variance components: Proportion of Variation Variance component Estimate variation ω 2 Between 0 . 3304 36% B σ 2 Within 0 . 5842 64% W ω 2 B + σ 2 Total 0 . 9146 100% W Typical differences (95% Prediction Intervals) : • for spots on the same rabbit ± 2 × √ 2 × 0 . 5842 = ± 2 . 16 cm 2 • for spots on different rabbits ± 2 × √ 2 × 0 . 9146 = ± 2 . 70 cm 2

  24. 21 Variance Component Models, December 2007 Interpretation of the size of the variance components: Approx. 2 3 of the variation in the measurements comes from the variation within rabbits. Maybe there is a systematic difference between the injection spots? Two-way anova: Source DF Type III SS Mean Square F Value Pr > F rabbit 5 12.833333 2.566667 4.69 0.0037 spot 5 3.833333 0.766667 1.40 0.2584 It does not look as if there is any systematic difference (P=0.26).

Recommend


More recommend