analysis of variance and regression november 27 2007
play

Analysis of variance and regression November 27, 2007 Other types - PowerPoint PPT Presentation

Analysis of variance and regression November 27, 2007 Other types of regression models Counts (Poisson models) Ordinal data proportional odds models model control model interpretation Survival analysis Lene Theil


  1. Analysis of variance and regression November 27, 2007

  2. Other types of regression models • Counts (Poisson models) • Ordinal data – proportional odds models – model control – model interpretation • Survival analysis

  3. Lene Theil Skovgaard, Dept. of Biostatistics, Institute of Public Health, University of Copenhagen e-mail: L.T.Skovgaard@biostat.ku.dk http://staff.pubhealth.ku.dk/~lts/regression07_2

  4. Other types of regression, November 2007 1 Until now, we have been looking at • regression for normally distributed data, where parameters describe – differences between groups – effect of a one unit increase in an explanatory variable • regression for binary data, logistic regression, where parameters describe – odds ratios for a one unit increase in an explanatory variable

  5. Other types of regression, November 2007 2 What about something ’in between’? • counts (Poisson distribution) – number of cancer cases in each municipality per year – number of positive pneumocock swabs • categorical variable with more than 2 categories, e.g. – degree of pain (none/mild/moderate/serious) – degree of liver fibrosis • non-normal quantitative measurements – censored data, survival analysis

  6. Other types of regression, November 2007 3 Generalised linear models: Multiple regression models, on a scale suitable for the data: Mean: µ Link function: g( µ ) linear in covariates, i.e. g ( µ ) = β 0 + β 1 x 1 + · · · + β k x k An important class of distributions for these models: Exponential families , including • Normal distribution (link= identity ): the general linear model • Binomial distribution (link= logit ): logistic regression • Poisson distribution (link= log )

  7. Other types of regression, November 2007 4 Poisson distribution: • distribution on the numbers 0,1,2,3,... • limit of Binomial distribution for N large, p small, mean: µ = Np – e.g. cancer events in a certain region • probability of k events: P ( Y = k ) = e − µ µ k k ! Example: positive swabs for 90 individuals from 18 families

  8. Other types of regression, November 2007 5

  9. Other types of regression, November 2007 6 Illustration of family profiles (we ignore the grouping of families here) O O O U O C U O O O C C C O O C O U C C O U C U U C C C O O O C O O O C U C U C O U O C O O C C U C O C C U U U U U U O O U O C O C C C U O C C U U O O U C C U U U U U U U C U O U

  10. Other types of regression, November 2007 7 We observe counts y fn ∼ Poisson( µ fn ) Additive model , corresponding to two-way ANOVA in family and name : log( µ fn ) = µ + α f + β n proc genmod; class family name; model swabs=family name / dist=poisson link=log cl; run;

  11. Other types of regression, November 2007 8 The GENMOD Procedure Model Information Data Set WORK.A0 Distribution Poisson Link Function Log Dependent Variable swabs Observations Used 90 Missing Values 1 Class Level Information Class Levels Values family 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 name 5 child1 child2 child3 father mother

  12. Other types of regression, November 2007 9 Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 1.5263 0.1845 1.1647 1.8879 68.43 <.0001 family 1 1 0.4636 0.2044 0.0630 0.8641 5.14 0.0233 family 2 1 0.9214 0.1893 0.5503 1.2925 23.68 <.0001 family 3 1 0.4473 0.2050 0.0455 0.8492 4.76 0.0291 . . . . . . . . . . . . . . . . . . family 16 1 0.2283 0.2146 -0.1923 0.6488 1.13 0.2875 family 17 1 -0.5725 0.2666 -1.0951 -0.0499 4.61 0.0318 family 18 0 0.0000 0.0000 0.0000 0.0000 . . name child1 1 0.3228 0.1281 0.0716 0.5739 6.34 0.0118 name child2 1 0.8990 0.1158 0.6721 1.1259 60.31 <.0001 name child3 1 0.9664 0.1147 0.7417 1.1912 71.04 <.0001 name father 1 0.0095 0.1377 -0.2604 0.2793 0.00 0.9451 name mother 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.

  13. Other types of regression, November 2007 10 Interpretation of Poisson analysis: • The family -parameters are uninteresting • The name -parameters are interesting • The mothers serve as a reference group • The model is additive on a logarithmic scale, i.e. multiplicative on the original scale

  14. Other types of regression, November 2007 11 Parameter estimates: name estimate (CI) ratio (CI) child1 0.3228 (0.0716, 0.5739) 1.38 (1.07, 1.78) child2 0.8990 (0.6721, 1.1259) 2.46 (1.96, 3.08) child3 0.9664 (0.7417, 1.1912) 2.63 (2.10, 3.29) father 0.0095 (-0.2604, 0.2793) 1.01 (0.77, 1.32) mother - - Interpretation: The youngest children have a 2-3 fold increased probability of infection, compared to their mother

  15. Other types of regression, November 2007 12 Ordinal data , e.g. level of pain • data on a rank scale • distance between response categories is not known / is undefined • often an imaginary underlying quantitative scale Covariates must describe the probability for each single response category.

  16. Other types of regression, November 2007 13 We are faced with a dilemma: • We may reduce to a binary outcome and use logistic regression – but there are several possible ’cuts’/thresholds • We can ’pretend’ that we are dealing with normally distributed data – of course most reasonable, when there are many response categories

  17. Other types of regression, November 2007 14 Example on liver fibrosis (degree 0,1,2 or 3), (Julia Johansen, KKHH) 3 blood markers related to fibrosis: • HA • YKL40 • PIIINP Problem: What can we say about the degree of fibrosis from the knowledge of these 3 blood markers?

  18. Other types of regression, November 2007 15 The MEANS Procedure Variable N Mean Std Dev Minimum Maximum -------------------------------------------------------------------------- degree_fibr 129 1.4263566 0.9903850 0 3.0000000 ykl40 129 533.5116279 602.2934049 50.0000000 4850.00 piiinp 127 13.4149606 12.4887192 1.7000000 70.0000000 ha 128 318.4531250 658.9499624 21.0000000 4730.00 --------------------------------------------------------------------------

  19. Other types of regression, November 2007 16 We start out simple, with one single blood marker x p for the p ’th patient (here: p = 1 , · · · , 126). Y p : the observed degree of fibrosis for the p ’th patient. We wish to specify the probabilities π pk = P ( Y p = k ) , k = 0 , 1 , 2 , 3 and their dependence on certain covariates. Since π p 0 + π p 1 + π p 2 + π p 3 = 1, we have a total of 3 parameters for each individual.

  20. Other types of regression, November 2007 17 We start by defining the cumulative probabilities ’from the top’: • divide between 2 and 3: model for γ p 3 = π p 3 • divide between 1 and 2: model for γ p 2 = π p 2 + π p 3 • divide between 0 and 1: model for γ p 1 = π p 1 + π p 2 + π p 3 Logistic regression for each threshold.

  21. Other types of regression, November 2007 18 Proportional odds model, model for ’cumulative logits’: � γ pk � logit( γ pk ) = log = α k + β × x p , 1 − γ pk or, on the original probability scale: exp( α k + βx p ) γ pk = γ k ( x p ) = 1 + exp( α k + βx p ) , k = 1 , 2 , 3

  22. Other types of regression, November 2007 19 Properties of the proportional odds model : • odds ratios do not depend on cutpoint, only on the covariates � γ k ( x 1 ) / (1 − γ k ( x 1 )) � log = β × ( x 1 − x 2 ) γ k ( x 2 ) / (1 − γ k ( x 2 )) • changing the ordering of the categories only implies a change of sign for the parameters

  23. Other types of regression, November 2007 20 Probabilities for each degree of fibrosis ( k ) can be calculated as successive differences: exp( α 3 + βx ) π 3 ( x ) = γ 3 ( x ) = 1 + exp( α 3 + βx ) π k ( x ) = γ k ( x ) − γ k +1 ( x ) , k = 0 , 1 , 2 These are logistic curves

  24. Other types of regression, November 2007 21 Cumulative probabilities:

  25. Other types of regression, November 2007 22 We start out using only the marker HA Very skewed distributions, – but we do not demand anything about these!?

  26. Other types of regression, November 2007 23 Proportional odds model in SAS: data fibrosis; infile ’julia.tal’ firstobs=2; input id degree_fibr ykl40 piiinp ha; if degree_fibr<0 then delete; run; proc logistic data=fibrosis descending; model degree_fibr=ha / link=logit clodds=pl; run;

  27. Other types of regression, November 2007 24 The LOGISTIC Procedure Model Information Data Set WORK.FIBROSIS Response Variable degree_fibr Number of Response Levels 4 Number of Observations 128 Model cumulative logit Optimization Technique Fisher’s scoring Response Profile Ordered Total Value degree_fibr Frequency 1 3 20 2 2 42 3 1 40 4 0 26 Probabilities modeled are cumulated over the lower Ordered Values.

Recommend


More recommend