Analysis of variance and regression December 4, 2007
Variance component models • Variance components • One-way anova with random variation – estimation – interpretations • Two-way anova with random variation • Crossed random effects • Ecological analyses
Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2
1 Variance Component Models, December 2007 Terminology for correlated measurements: • Multivariate outcome : Several outcomes (responses) for each individual, e.g. a number of hormone measurements that we want to study simultaneously. • Cluster design : Same outcome (response) measured on all individuals in a number of families/villages/school classes • Repeated measurements: Same outcome (response) measured in different situations (or at different spots) for the same individual. • Longitudinal measurements: Same outcome (response) measured consecutively over time for each individual.
2 Variance Component Models, December 2007 Variance component models Generalisations of ANOVA-type models or regression models, involving several sources of random variation (variance components) • environmental variation – between regions, hospitals or countries • biological variation – variation between individuals, families or animals • within-individual variation – variation between arms, teeth, injection sites, days • variation due to uncontrollable circumstances – time of day, temperature, observer • measurement error
3 Variance Component Models, December 2007 Typical studies involve data from: • a number of family members from a sample of households • pupils from a sample of school classes • measurements on several spots of each individual Alternative name (for some of them): Multilevel models • variation on each level (variance component) • possibly systematic effects (covariates) on each level
4 Variance Component Models, December 2007 Examples of hierarchies: individual context/cluster → → level 1 level 2 level 3 → → subjects twin pairs countries → → subjects families regions → → students classes schools → → visits subjects centres → →
5 Variance Component Models, December 2007 Merits • Certain effects may be estimated more precisely, since some sources of variation are eliminated, e.g. by making comparisons within a family. This is analogous to the paired comparison situation. • When planning subsequent investigations, the knowledge of the relative sizes of the variance components will be of help in deciding the number of repetitions needed at each level (if possible).
6 Variance Component Models, December 2007 Drawbacks • When making inference (estimation and testing), it is important to take all sources of variation into account, and effects have to be evaluated using the relevant variation! • Bias may result, if one or more sources of variation are disregarded
7 Variance Component Models, December 2007 Measurements ’ belonging together ’ in the same cluster look alike (are correlated ) If we fail to take this correlation into account, we will experience: • possible bias in the mean value structure • low efficiency (type 2 error) for evaluation of level 1 covariates (within-cluster effects) • too small standard errors (type 1 error) for estimates of level 2 effects (between-cluster effects)
8 Variance Component Models, December 2007 Concepts of the day: • advantage/necessity of random effects • generalisations of ANOVA-type models Examples with small data sets • some of them too small to allow for trustworthy interpretations • illustrative precisely because of their limited size Illustrated with SAS PROC MIXED
9 Variance Component Models, December 2007 One-way analysis of variance – with random variation: Comparison of k ’ groups/clusters ’, satisfying • The groups are not of individual interest and it is of no interest to test whether they have identical means • The groups may be thought of as representatives from a population , that we want to describe. Example: 10 consecutive measurements of blood pressure on a sample of 50 women: • We ’know’ that the women differ – and we do not care! • We only want to learn something about blood pressure in the female population in general
10 Variance Component Models, December 2007 Example of one-way anova structure: 6 rabbits are vaccinated, each in 6 spots on the back Response Y : swelling in cm 2 Model: swelling = ’grand mean’ + ’rabbit deviation’ +’variation’ ε rs ∼ N (0 , σ 2 ), where y rs = µ + α r + ε rs , r = 1 , · · · , R = 6 denotes the rabbit , s = 1 , · · · , S = 6 denotes the spot The variation can be regarded either as ’within-rabbit variation’ or ’measurement error’ (probably a combination of the two).
11 Variance Component Models, December 2007 Rabbit means: µ r = µ + α r
12 Variance Component Models, December 2007 anova -table: SS df MS=SS/df F Between 12 . 8333 R − 1 = 5 2 . 5667 4 . 39 Within 17 . 5266 R ( S − 1) = 30 0 . 5842 Total 30 . 3599 RS − 1 = 35 0 . 8674 Test for identical rabbits means: F = 4 . 39 ∼ F (5 , 30) , P = 0 . 004, But: We are not interested in these particular 6 rabbits, only in rabbits in general, as a species ! We assume these 6 rabbits to have been randomly selected from the species.
13 Variance Component Models, December 2007 We choose to model rabbit variation instead of rabbit levels: swelling = ’grand mean’ +’between-rabbit variation’ +’within-rabbit variation’ y rs = µ + a r + ε rs , where the a r ’s and the ε rs ’s are assumed independent , normally distributed with Var( a r )= ω 2 Var( ε rs )= σ 2 B , W The variation between rabbits has been made random ω 2 B and σ 2 W are variance components , and the model is also called a two-level model
14 Variance Component Models, December 2007 Fixed vs. random effects? • Fixed: – all values of the factor present (typically only a few, e.g. treatment) – allows inference for these particular factor values only – must include a reasonable number of observations for each factor value • Random: – a representative sample of values of the factor is present – allows inference to be extended beyond the values in the experiment and to the population of possible factor values (e.g. geographical areas, classes, rabbits) – is necessary when we have a covariate for this level
15 Variance Component Models, December 2007 Interpretation : All observations have common mean and variance: y rs ∼ N ( µ, ω 2 B + σ 2 W ) but: Measurements made on the same rabbit are correlated with the intra-class correlation ω 2 B Corr( y r 1 , y r 2 ) = ρ = ω 2 B + σ 2 W Measurements made on the same rabbit tend to look more alike than measurements made on different rabbits. All measurements on the same rabbit look equally much alike . This correlation structure is called compound symmetry (CS) or exchangeability .
16 Variance Component Models, December 2007 Estimation of variance components First step is to determine the mean values of the mean squares (in balanced situations): E(MS B ) = R ω 2 B + σ 2 W E(MS W ) = σ 2 W and from this we get the estimates σ 2 ˜ W = MS W B = MS B − MS W σ 2 ˜ R
17 Variance Component Models, December 2007 Note: σ 2 It may happen that ˜ B becomes negative! • by a coincidence • as a result of competition between units belonging together, e.g. when measuring yield for plants grown in the same pot In this case, it will be reported as a zero
18 Variance Component Models, December 2007 Reading in data in SAS: data rabbit_orig; input spot $ y1-y6; data rabbit; cards; set rabbit_orig; a 7.9 8.7 7.4 7.4 7.1 8.2 rabbit=1; swelling=y1; output; b 6.1 8.2 7.7 7.1 8.1 5.9 rabbit=2; swelling=y2; output; c 7.5 8.1 6.0 6.4 6.2 7.5 rabbit=3; swelling=y3; output; d 6.9 8.5 6.8 7.7 8.5 8.5 rabbit=4; swelling=y4; output; e 6.7 9.9 7.3 6.4 6.4 7.3 rabbit=5; swelling=y5; output; f 7.3 8.3 7.3 5.8 6.4 7.7 rabbit=6; swelling=y6; output; ; run; run;
19 Variance Component Models, December 2007 In SAS, the estimation can be performed as: proc mixed data=rabbit; class rabbit; model swelling = / s; random rabbit; run; Covariance Parameter Estimates Cov Parm Estimate rabbit 0.3304 Residual 0.5842 Solution for Fixed Effects Standard Effect Estimate Error DF t Value Pr > |t| Intercept 7.3667 0.2670 5 27.59 <.0001
20 Variance Component Models, December 2007 Interpretation of variance components: Proportion of Variation Variance component Estimate variation ω 2 Between 0 . 3304 36% B σ 2 Within 0 . 5842 64% W ω 2 B + σ 2 Total 0 . 9146 100% W Typical differences (95% Prediction Intervals) : • for spots on the same rabbit ± 2 × √ 2 × 0 . 5842 = ± 2 . 16 cm 2 • for spots on different rabbits ± 2 × √ 2 × 0 . 9146 = ± 2 . 70 cm 2
21 Variance Component Models, December 2007 Interpretation of the size of the variance components: Approx. 2 3 of the variation in the measurements comes from the variation within rabbits. Maybe there is a systematic difference between the injection spots? Two-way anova: Source DF Type III SS Mean Square F Value Pr > F rabbit 5 12.833333 2.566667 4.69 0.0037 spot 5 3.833333 0.766667 1.40 0.2584 It does not look as if there is any systematic difference (P=0.26).
Recommend
More recommend