analysis of variance and regression november 22 2007
play

Analysis of variance and regression November 22, 2007 - PowerPoint PPT Presentation

Analysis of variance and regression November 22, 2007 Parametrisations : Choice of parameters Comparison of models Test for linearity Linear splines Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health,


  1. Analysis of variance and regression November 22, 2007

  2. Parametrisations : • Choice of parameters • Comparison of models • Test for linearity • Linear splines

  3. Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2

  4. Parametrisations, November 2007 1 Parameter: unknown quantity that we want to estimate (provide a good guess) • the decrease in blood pressure following treatment A or the difference in decrease for treatment A and placebo • the increase in insulin growth factor (IGF-1) with age Parametrisation: choice of which parameters are to enter the model description Re-parametrisation: shift to a new set of parameters

  5. Parametrisations, November 2007 2 Most well known choice of parametrisation: • Change of scale/units Do we measure height in cm or m ? Take the relation of lung capacity versus height: fev1 = α + β × height If we change from measuring height in cm to m , we also change the regression coefficient (the parameter) from β to β ∗ = 100 β • Change of origin/intercept – choice of another reference group in ANOVA – subtracting e.g. 170 cm from all height measurements Re-parametrisations do not change the model as such! • same fitted values • same confidence- and prediction limits • – but a possibility for interpretations of specific interest

  6. Parametrisations, November 2007 3 What makes us choose a specific parametrisation? • Ease - the program has some default parametrisations • Estimation of specific quantities: - the potency of a drug, ED 50 or ED 90 • Test of specific hypotheses - difference between treatment and placebo - difference in height for boys and girls at the age of 14

  7. Parametrisations, November 2007 4 In the more advanced situations ( beyond linearity ) – non-linear regression, logistic regression, correlated observations: • Knowledge of distributional assumptions: - Some parameter estimates may be more normally distributed than others (and we like to be able to construct symmetric confidence intervals, using the standard error) In linear models the estimates have exact normal distributions (provided the model assumptions are met, of course...)

  8. Parametrisations, November 2007 5 Example: A group consisting of 45 patients with Reumatoid Arthritis are randomised to one out of 6 possible treatments ( treat ): • Placebo • Aspirin • One of 4 doses ( dose ) of an active anti-inflammatory drug which we shall denote X. Outcome: An index ( Index ) summing up the effectiveness of the treatment (decrease in various symptoms)

  9. Parametrisations, November 2007 6 Outcome: Index -values: Reference: Woolson, R.F. & Clarke, W.R.: Statistical methods for the analysis of biomedical data. 2ed., Wiley, 2002. (Exercise 10.4 page 409)

  10. Parametrisations, November 2007 7 How do we represent these data in SAS? Obs group type dose index 1 placebo placebo 0 6.2 2 placebo placebo 0 5.8 3 placebo placebo 0 9.5 4 placebo placebo 0 10.2 5 placebo placebo 0 8.3 6 placebo placebo 0 7.9 7 placebo placebo 0 9.2 38 x20 active 20 29.5 39 x20 active 20 34.6 40 x20 active 20 31.9 41 x25 active 25 41.8 42 x25 active 25 45.2 43 x25 active 25 43.2 44 x25 active 25 46.5 45 x25 active 25 41.7

  11. Parametrisations, November 2007 8 Summary statistics The MEANS Procedure Analysis Variable : index N group Obs N Mean Std Dev Minimum Maximum ------------------------------------------------------------------------- aspirin 11 11 23.2545455 2.9561338 17.1000000 26.6000000 placebo 9 9 8.6222222 1.8369661 5.8000000 11.6000000 x10 5 5 5.9600000 0.7635444 5.2000000 6.9000000 x15 9 9 17.9444444 1.0607911 16.4000000 19.5000000 x20 6 6 33.1333333 3.0051068 29.5000000 37.2000000 x25 5 5 43.6800000 2.1182540 41.7000000 46.5000000 -------------------------------------------------------------------------

  12. Parametrisations, November 2007 9 We start by looking at the 4 X-groups only: Below, the outcome Index is plotted against Dose group.

  13. Parametrisations, November 2007 10 Comparison of 4 dose groups: One-way ANOVA Model written as a multiple regression: Y = β 0 + β 1 x 1 + β 2 x 2 + β 3 x 3 + ǫ where the x ’s are socalled ”dummy”variables: x 1 is 1 if subject i belongs to the first group, and 0 otherwise x 2 is 1 if subject i belongs to the second group, and 0 otherwise x 3 is 1 if subject i belongs to the third group, and 0 otherwise With this parametrisation, β 0 will correspond to the level for the last group (the reference group, here group 4); β 1 will be the difference in level between group 1 and group 4 β 2 will be the difference in level between group 2 and group 4 and so on...

  14. Parametrisations, November 2007 11 Traditional One-way ANOVA in SAS: proc glm data=drug; where type=’active’; class group; model index=group / solution; run; which yields the output: The GLM Procedure Class Level Information Class Levels Values group 4 x10 x15 x20 x25 Number of Observations Used 25

  15. Parametrisations, November 2007 12 The GLM Procedure Dependent Variable: index Sum of Source DF Squares Mean Square F Value Pr > F Model 3 4391.364444 1463.788148 412.97 <.0001 Error 21 74.435556 3.544550 Corrected Total 24 4465.800000 R-Square Coeff Var Root MSE index Mean 0.983332 7.734994 1.882698 24.34000 Source DF Type I SS Mean Square F Value Pr > F group 3 4391.364444 1463.788148 412.97 <.0001 Source DF Type III SS Mean Square F Value Pr > F group 3 4391.364444 1463.788148 412.97 <.0001

  16. Parametrisations, November 2007 13 Standard Parameter Estimate Error t Value Pr > |t| Intercept 43.68000000 B 0.84196796 51.88 <.0001 group x10 -37.72000000 B 1.19072251 -31.68 <.0001 group x15 -25.73555556 B 1.05011855 -24.51 <.0001 group x20 -10.54666667 B 1.14003001 -9.25 <.0001 group x25 0.00000000 B . . . NOTE: The X’X matrix has been found to be singular, and a generalized inverse was used to solve the normal equations. Terms whose estimates are followed by the letter ’B’ are not uniquely estimable. The ’ B ’ to the right of the estimates is explained in the NOTE It simply means: By renaming the group levels/names, we may get a different parametrisation!

  17. Parametrisations, November 2007 14 We here disregard the problem of variance heterogeneity : proc glm data=drug; where type=’active’; class group; model index=group / noint solution; means group / hovtest=levene; run; from which we get Levene’s Test for Homogeneity of index Variance ANOVA of Squared Deviations from Group Means Sum of Mean Source DF Squares Square F Value Pr > F group 3 192.7 64.2320 5.92 0.0043 Error 21 228.0 10.8585 A clear indication that the variance increases with dose. Logarithms?

  18. Parametrisations, November 2007 15 Same model , now parametrised with one level for each group: proc glm data=drug; where type=’active’; class group; model index=group / noint solution; run; now yielding instead: Dependent Variable: index Sum of Source DF Squares Mean Square F Value Pr > F Model 4 19202.25444 4800.56361 1354.35 <.0001 Error 21 74.43556 3.54455 Uncorrected Total 25 19276.69000

  19. Parametrisations, November 2007 16 R-Square Coeff Var Root MSE index Mean 0.983332 7.734994 1.882698 24.34000 Source DF Type I SS Mean Square F Value Pr > F group 4 19202.25444 4800.56361 1354.35 <.0001 Source DF Type III SS Mean Square F Value Pr > F group 4 19202.25444 4800.56361 1354.35 <.0001 Standard Parameter Estimate Error t Value Pr > |t| group x10 5.96000000 0.84196796 7.08 <.0001 group x15 17.94444444 0.62756587 28.59 <.0001 group x20 33.13333333 0.76860808 43.11 <.0001 group x25 43.68000000 0.84196796 51.88 <.0001 The tests now refer to the hypothesis of a zero level (which is not interesting)

  20. Parametrisations, November 2007 17 Parametrisations in One-way ANOVA • One level ( µ 4 ) for the reference group (the last, numerically or alphabetically), supplemented with differences from this reference group to each of the remaining groups ( β 1 , β 2 , β 3 ) Y gi = µ 4 + β g + ε gi , – good for testing of identity and certain pairwise comparisons β i = µ i − µ 4 • One level for each group Y gi = µ g + ε gi – good for estimation, not suited for testing!!

  21. Parametrisations, November 2007 18 Estimate statements in GLM If we want to compare dose 10 with dose 15: proc glm data=drug; where type=’active’; class group; model index=group / noint solution; estimate ’dose 15 vs. dose 10’ group -1 1 0 0; run; from which we get Standard Parameter Estimate Error t Value Pr > |t| dose 15 vs. dose 10 11.9844444 1.05011855 11.41 <.0001

  22. Parametrisations, November 2007 19 We return to the scatter plot, now with a linear regression line Can we use a simple model, saying that the dose effect is linear?

Recommend


More recommend