Analysis of variance and regression Other types of regression models
Other types of regression models • Counts: Poisson models • Ordinal data: Proportional odds models • Survival analysis (censored, time-to-event data): Cox proportional hazards model • (Other types of censored data)
1 Other types of regression Until now, we have been looking at • regression for normally distributed data, where parameters describe – differences between groups – expected difference in outcome for one unit’s difference in an explanatory variable • regression for binary data, logistic regression, where parameters describe – odds ratios for one unit’s difference in an explanatory variable
2 Other types of regression What about something ’in between’? • counts (Poisson distribution) – number of cancer cases in each municipality per year – number of positive pneumocock swabs • ordered categorical variable with more than 2 categories, e.g., – degree of pain (none/mild/moderate/serious) – degree of liver fibrosis
3 Other types of regression Generalised linear models: Multiple regression models, on a scale suitable for the data: Mean: M Link function: g( M ) linear in covariates, that is, g ( M ) = b 0 + b 1 x 1 + · · · + b k x k Some standard distributions (and link functions): • Normal distribution ( link=IDENTITY ): the general linear model • Binomial distribution ( link=LOGIT ): logistic regression • Poisson distribution ( link=LOG )
4 Other types of regression Poisson distribution: • distribution on the numbers 0, 1, 2, 3, . . . • limit of binomial distribution for N large, p small, mean: M = Np – e.g., CNS cancer cases among registered cell phone users • probability of k events: P ( Y = k ) = e − M M k k ! Example: Positive swabs for 90 individuals from 18 families
5 Other types of regression
6 Other types of regression Illustration of family profiles O O O U O C U O O O C C C O O C O U C C O U C U U C C C O O O C O O O C U C U C O U O C O O C C U C O C C U U U U U U O O U O C O C C C U O C C U U O O U C C U U U U U U U C U O U
7 Other types of regression We observe counts (we ignore the grouping of families here) Y fn ∼ Poisson( M fn ) Additive model , corresponding to two-way ANOVA in family and name : log( M fn ) = M + a f + b n PROC GENMOD; CLASS family name; MODEL swabs=family name / DIST=POISSON LINK=LOG CL; RUN;
8 Other types of regression The GENMOD Procedure Model Information Data Set WORK.A0 Distribution Poisson Link Function Log Dependent Variable swabs Observations Used 90 Missing Values 1 Class Level Information Class Levels Values family 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 name 5 child1 child2 child3 father mother
9 Other types of regression Analysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 1.5263 0.1845 1.1647 1.8879 68.43 <.0001 family 1 1 0.4636 0.2044 0.0630 0.8641 5.14 0.0233 family 2 1 0.9214 0.1893 0.5503 1.2925 23.68 <.0001 family 3 1 0.4473 0.2050 0.0455 0.8492 4.76 0.0291 . . . . . . . . . . . . . . . . . . family 16 1 0.2283 0.2146 -0.1923 0.6488 1.13 0.2875 family 17 1 -0.5725 0.2666 -1.0951 -0.0499 4.61 0.0318 family 18 0 0.0000 0.0000 0.0000 0.0000 . . name child1 1 0.3228 0.1281 0.0716 0.5739 6.34 0.0118 name child2 1 0.8990 0.1158 0.6721 1.1259 60.31 <.0001 name child3 1 0.9664 0.1147 0.7417 1.1912 71.04 <.0001 name father 1 0.0095 0.1377 -0.2604 0.2793 0.00 0.9451 name mother 0 0.0000 0.0000 0.0000 0.0000 . . Scale 0 1.0000 0.0000 1.0000 1.0000 NOTE: The scale parameter was held fixed.
10 Other types of regression Interpretation of Poisson analysis: • The family -parameters are uninteresting • The name -parameters are interesting • The mothers serve as the reference group • The model is additive on a logarithmic scale, that is, multiplicative on the original scale
11 Other types of regression Parameter estimates: name estimate (CI) ratio (CI) child1 0.3228 (0.0716, 0.5739) 1.38 (1.07, 1.78) child2 0.8990 (0.6721, 1.1259) 2.46 (1.96, 3.08) child3 0.9664 (0.7417, 1.1912) 2.63 (2.10, 3.29) father 0.0095 (-0.2604, 0.2793) 1.01 (0.77, 1.32) mother - - Interpretation: The youngest children have a 2-3 fold increased probability of infection, compared to their mother
12 Other types of regression Ordinal data , e.g., level of pain • data on a rank (ordered) scale • distance between response categories is not known / is undefined • often an imaginary underlying continuous scale Covariates are intended to describe the probability for each response category, and the effect of each covariate is likely to be a general shift in upwards/downwards direction (in contrast to, e.g., increasing/decreasing probabilities of both extremes simultaneously)
13 Other types of regression Possibilities based on knowledge sofar: • We can pretend that we are dealing with normally distributed data – of course most reasonable, when there are many response categories • We may reduce to a two-category outcome and use logistic regression – but there are several possible cutpoints/thresholds Alternative: Proportional odds
14 Other types of regression Example on liver fibrosis (degree 0,1,2 or 3), (Julia Johansen, KKHH) 3 blood markers related to fibrosis: • ha • ykl40 • pIIInp Problem: What can we say about the degree of fibrosis from the knowledge of these 3 blood markers?
15 Other types of regression The MEANS Procedure Variable N Mean Std Dev Minimum Maximum ------------------------------------------------------------------ degree_fibr 129 1.4263566 0.9903850 0 3.0000000 ykl40 129 533.5116279 602.2934049 50.0000000 4850.00 pIIInp 127 13.4149606 12.4887192 1.7000000 70.0000000 ha 128 318.4531250 658.9499624 21.0000000 4730.00 ------------------------------------------------------------------
16 Other types of regression Y i : the observed degree of fibrosis for the i ’th patient. We wish to specify the probabilities p ik = P ( Y i = k ) , k = 0 , 1 , 2 , 3 and their dependence on certain covariates. Since p i 0 + p i 1 + p i 2 + p i 3 = 1, we have a total of 3 free parameters for each individual.
17 Other types of regression We start by defining the cumulative probabilities from the top: • split between 2 and 3: model for q i 3 = p i 3 • split between 1 and 2: model for q i 2 = p i 2 + p i 3 • split between 0 and 1: model for q i 1 = p i 1 + p i 2 + p i 3 Logistic regression model for each threshold.
18 Other types of regression We start out simple, with one single blood marker x i for the i ’th patient (here: i = 1 , . . . , 126). Proportional odds model, model for ’cumulative logits’: � � q ik logit( q ik ) = log = a k + b × x i , 1 − q ik or, on the original probability scale: exp( a k + bx i ) q ik = q k ( x i ) = 1 + exp( a k + bx i ) , k = 1 , 2 , 3
19 Other types of regression Properties of the proportional odds model : • the odds ratio does not depend on the cut point, only on the covariates � q k ( x 1 ) / (1 − q k ( x 1 )) � log = b × ( x 1 − x 2 ) q k ( x 2 ) / (1 − q k ( x 2 )) • reversing the ordering of the categories only implies a change of sign for the log odds parameters
20 Other types of regression Probabilities for each degree of fibrosis ( k ) can be calculated as successive differences: exp( a 3 + bx ) p 3 ( x ) = q 3 ( x ) = 1 + exp( a 3 + bx ) p k ( x ) = q k ( x ) − q k +1 ( x ) , k = 0 , 1 , 2
21 Other types of regression We start out using only the marker HA Very skewed distributions, – but we do not demand anything about these!?
22 Other types of regression Proportional odds model in SAS: DATA fibrosis; INFILE ’julia.tal’ FIRSTOBS=2; INPUT id degree_fibr ykl40 pIIInp ha; IF degree_fibr<0 THEN DELETE; RUN; PROC LOGISTIC DATA=fibrosis DESCENDING; MODEL degree_fibr=ha / LINK=LOGIT CLODDS=PL; RUN;
23 Other types of regression The LOGISTIC Procedure Model Information Data Set WORK.FIBROSIS Response Variable degree_fibr Number of Response Levels 4 Number of Observations 128 Model cumulative logit Optimization Technique Fisher’s scoring Response Profile Ordered Total Value degree_fibr Frequency 1 3 20 2 2 42 3 1 40 4 0 26 Probabilities modeled are cumulated over the lower Ordered Values.
24 Other types of regression Score Test for the Proportional Odds Assumption Chi-Square DF Pr > ChiSq 5.1766 2 0.0751 Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 3 1 -2.3175 0.3113 55.4296 <.0001 Intercept 2 1 -0.4597 0.2029 5.1349 0.0234 Intercept 1 1 1.0945 0.2334 21.9935 <.0001 ha 1 0.00140 0.000383 13.3099 0.0003 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits ha 1.001 1.001 1.002 Profile Likelihood Confidence Interval for Adjusted Odds Ratios Effect Unit Estimate 95% Confidence Limits ha 1.0000 1.001 1.001 1.002
Recommend
More recommend