1 Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs
2 What are linear models? • Something you have met already! • Model to explain, with a linear relationship, one response variable with one or more explanatory variables • y ~ x
3 What are linear models? Stage 1: response continuous - General Linear Model Procedure Response Predictors Single linear regression y ~ x Continuous 1 Continuous/discrete Two-sample t-test y ~ x Continuous 1 categorical (2 levels) One-way ANOVA y ~ x Continuous 1 categorical (2 or more levels) Two-way ANOVA y ~ x1*x2 Continuous 2 categorical (2 or more levels each) Stage 2: incl other types of response - Generalised Linear Model
4 Key points T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Can be extended to ‘generalised linear model’ for different types of response
5 Single linear regression y = 11.23 − 0.07*x > model <- lm(data = mydata, y ~ x) > summary(model) Intercept Call: lm(formula = y ~ x) Slope Residuals: Min 1Q Median 3Q Max Test of intercept -7.2875 -2.4868 -0.4081 2.2612 10.7125 Test of slope Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.23481 1.65035 6.808 3.9e-07 *** % of variation in y explained by x x -0.07373 0.02933 -2.514 0.0187 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Test of model (same as test of Residual standard error: 3.935 on 25 degrees of freedom slope for one variable) Multiple R-squared: 0.2018, Adjusted R-squared: 0.1699 F-statistic: 6.321 on 1 and 25 DF, p-value: 0.01874
6 Two-sample t-test t.test(y ~ x, data = mydata, paired = F , var.equal = T) t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Is there a significant Two Sample t-test data: mass by sex difference between the t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 masses of male and female 95 percent confidence interval: -3.167734 -0.422266 chaffinches? sample estimates: mean in group females mean in group males 20.480 22.275 t.test(cell$growth ~ cell$treatment, paired = F,var.equal = T) Two Sample t-test Does treatment with data: cell$growth by cell$treatment t = 2.6471, df = 38, p-value = 0.01175 Compound X affect cell growth alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: compared to control treatment 0.422266 3.167734 sample estimates: mean in group control mean in group withx 22.275 20.480
7 Two-sample t-test Using t.test > t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Two Sample t-test data: mass by sex t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: > mod <- lm(mass ~ sex, data = chaff) -3.167734 -0.422266 Using lm() sample estimates: > summary(mod) Call: mean in group females mean in group males 20.480 22.275 lm(formula = mass ~ sex, data = chaff) Female mean sig diff from 0. Not important Residuals: Min 1Q Median 3Q Max -5.2750 -1.7000 -0.3775 1.6200 4.1250 Coefficients: Estimate Std. Error t value Pr(>|t|) Intercept is mean of ‘lowest’ level of factor (Intercept) 20.4800 0.4795 42.712 <2e-16 *** Difference is sexmales 1.7950 0.6781 2.647 0.0118 * significant --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Difference between intercept Residual standard error: 2.144 on 38 degrees of freedom and next level (i.e., the slope) Multiple R-squared: 0.1557, Adjusted R-squared: 0.1335 F-statistic: 7.007 on 1 and 38 DF, p-value: 0.01175 Why use lm() - because it is extendable
8 One-way ANOVA mod <- aov(y ~ x, data = mydata) summary(mod) > modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
9 One-way ANOVA Using aov() > modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Using lm() Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > modl <- lm(diameter ~ medium, data = culture) > summary(modl) lm(formula = diameter ~ medium, data = culture) Whether Residuals: Min 1Q Median 3Q Max differences are -1.541 -0.700 -0.080 0.424 1.949 Intercept is mean of lowest level of factor (control) significant Control mean = 10.07 Coefficients: Difference between intercept and ‘with sugar’ Estimate Std. Error t value Pr(>|t|) Difference between intercept and ‘with sugar + amino acids’ (Intercept) 10.0700 0.2930 34.370 < 2e-16 *** mediumwith sugar 0.1700 0.4143 0.410 0.68483 mediumwith sugar + amino acids 1.3310 0.4143 3.212 0.00339 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9265 on 27 degrees of freedom Multiple R-squared: 0.3117, Adjusted R-squared: 0.2607 F-statistic: 6.113 on 2 and 27 DF, p-value: 0.00646 Whether ‘model’ (the one factor) is significant
10 Two-way ANOVA Effect of two factors on wing length of butterflies – Response: wing length – Predictors: sex and spp (categorical) mod <- lm(y ~ x1 * x2, data = mydata) mod <- lm(y ~ x1 + x2, data = mydata) OR 3 tests: 2 Main effects and interaction 2 tests: 2 Main effects Stage 1: aov(y ~ x1 * x2, data = mydata) mod <- aov(winglen ~ sex * spp, data = butter)) F.concocti F.flappa summary(mod) females 31.37 24.67 Df Sum Sq Mean Sq F value Pr(>F) males 24.97 23.45 sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
11 Two-way ANOVA F.concocti F.flappa females 31.37 24.67 males 24.97 23.45 mod <- aov(winglen ~ sex * spp,data = butter)) summary(mod) mod2 <- lm(winglen ~ sex * spp, data = butter) Df Sum Sq Mean Sq F value Pr(>F) summary(mod2) sex 1 145.16 145.161 9.2717 0.004334 ** Call: spp 1 168.92 168.921 10.7893 0.002280 ** lm(formula = winglen ~ sex * spp, data = butter) sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 Residuals: --- Min 1Q Median 3Q Max Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -7.770 -3.095 0.090 2.920 6.530 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 31.370 1.251 25.071 < 2e-16 *** sexmales -6.400 1.770 -3.617 0.000907 *** sppF.flappa -6.700 1.770 -3.786 0.000560 *** sexmales:sppF.flappa 5.180 2.503 2.070 0.045692 * The same Each explanatory variable --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.957 on 36 degrees of freedom anova(mod2) Multiple R-squared: 0.4034, Adjusted R-squared: 0.3537 Analysis of Variance Table F-statistic: 8.115 on 3 and 36 DF, p-value: 0.0002949 Response: winglen Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** Whole model spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656
12 What are linear models? • Response can be continuous, discrete or categorical • Predictors can be continuous or categorical • Type of response and (“errors”), type of predictors and relationship between them determines type of model
13 Key points T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ mod <- lm( response ~ explanatory1 , data = mydata ) Use summary(mod2) mod <- lm( response ~ explanatory1 * explanatory2 , data = mydata ) summary(mod) anova(mod) Can be extended to ‘generalised linear model’ for different types of response
14 Summary – regression, t-tests and anova are linear models and have the assumptions of classical linear models – summary() output • Intercept estimate is mean of lowest factor level • There is significance test for each estimate • There is a significance test for the model as a whole
Recommend
More recommend