Workshop 2 Building from Linear Models to Generalised Linear Models - PowerPoint PPT Presentation

1 Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs

2 What are linear models? • Something you have met already! • Model to explain, with a linear relationship, one response variable with one or more explanatory variables • y ~ x

3 What are linear models? Stage 1: response continuous - General Linear Model Procedure Response Predictors Single linear regression y ~ x Continuous 1 Continuous/discrete Two-sample t-test y ~ x Continuous 1 categorical (2 levels) One-way ANOVA y ~ x Continuous 1 categorical (2 or more levels) Two-way ANOVA y ~ x1*x2 Continuous 2 categorical (2 or more levels each) Stage 2: incl other types of response - Generalised Linear Model

4 Key points T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ Can be extended to ‘generalised linear model’ for different types of response

5 Single linear regression y = 11.23 − 0.07*x > model <- lm(data = mydata, y ~ x) > summary(model) Intercept Call: lm(formula = y ~ x) Slope Residuals: Min 1Q Median 3Q Max Test of intercept -7.2875 -2.4868 -0.4081 2.2612 10.7125 Test of slope Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 11.23481 1.65035 6.808 3.9e-07 *** % of variation in y explained by x x -0.07373 0.02933 -2.514 0.0187 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Test of model (same as test of Residual standard error: 3.935 on 25 degrees of freedom slope for one variable) Multiple R-squared: 0.2018, Adjusted R-squared: 0.1699 F-statistic: 6.321 on 1 and 25 DF, p-value: 0.01874

6 Two-sample t-test t.test(y ~ x, data = mydata, paired = F , var.equal = T) t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Is there a significant Two Sample t-test data: mass by sex difference between the t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 masses of male and female 95 percent confidence interval: -3.167734 -0.422266 chaffinches? sample estimates: mean in group females mean in group males 20.480 22.275 t.test(cell$growth ~ cell$treatment, paired = F,var.equal = T) Two Sample t-test Does treatment with data: cell$growth by cell$treatment t = 2.6471, df = 38, p-value = 0.01175 Compound X affect cell growth alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: compared to control treatment 0.422266 3.167734 sample estimates: mean in group control mean in group withx 22.275 20.480

7 Two-sample t-test Using t.test > t.test(mass ~ sex, data = chaff, paired = F, var.equal = T) Two Sample t-test data: mass by sex t = -2.6471, df = 38, p-value = 0.01175 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: > mod <- lm(mass ~ sex, data = chaff) -3.167734 -0.422266 Using lm() sample estimates: > summary(mod) Call: mean in group females mean in group males 20.480 22.275 lm(formula = mass ~ sex, data = chaff) Female mean sig diff from 0. Not important Residuals: Min 1Q Median 3Q Max -5.2750 -1.7000 -0.3775 1.6200 4.1250 Coefficients: Estimate Std. Error t value Pr(>|t|) Intercept is mean of ‘lowest’ level of factor (Intercept) 20.4800 0.4795 42.712 <2e-16 *** Difference is sexmales 1.7950 0.6781 2.647 0.0118 * significant --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Difference between intercept Residual standard error: 2.144 on 38 degrees of freedom and next level (i.e., the slope) Multiple R-squared: 0.1557, Adjusted R-squared: 0.1335 F-statistic: 7.007 on 1 and 38 DF, p-value: 0.01175 Why use lm() - because it is extendable

8 One-way ANOVA mod <- aov(y ~ x, data = mydata) summary(mod) > modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

9 One-way ANOVA Using aov() > modc <- aov(diameter ~ medium, data = culture) > summary(modc) Df Sum Sq Mean Sq F value Pr(>F) medium 2 10.495 5.2473 6.1129 0.00646 ** Residuals 27 23.177 0.8584 --- Using lm() Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > modl <- lm(diameter ~ medium, data = culture) > summary(modl) lm(formula = diameter ~ medium, data = culture) Whether Residuals: Min 1Q Median 3Q Max differences are -1.541 -0.700 -0.080 0.424 1.949 Intercept is mean of lowest level of factor (control) significant Control mean = 10.07 Coefficients: Difference between intercept and ‘with sugar’ Estimate Std. Error t value Pr(>|t|) Difference between intercept and ‘with sugar + amino acids’ (Intercept) 10.0700 0.2930 34.370 < 2e-16 *** mediumwith sugar 0.1700 0.4143 0.410 0.68483 mediumwith sugar + amino acids 1.3310 0.4143 3.212 0.00339 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.9265 on 27 degrees of freedom Multiple R-squared: 0.3117, Adjusted R-squared: 0.2607 F-statistic: 6.113 on 2 and 27 DF, p-value: 0.00646 Whether ‘model’ (the one factor) is significant

10 Two-way ANOVA Effect of two factors on wing length of butterflies – Response: wing length – Predictors: sex and spp (categorical) mod <- lm(y ~ x1 * x2, data = mydata) mod <- lm(y ~ x1 + x2, data = mydata) OR 3 tests: 2 Main effects and interaction 2 tests: 2 Main effects Stage 1: aov(y ~ x1 * x2, data = mydata) mod <- aov(winglen ~ sex * spp, data = butter)) F.concocti F.flappa summary(mod) females 31.37 24.67 Df Sum Sq Mean Sq F value Pr(>F) males 24.97 23.45 sex 1 145.16 145.161 9.2717 0.004334 ** spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

11 Two-way ANOVA F.concocti F.flappa females 31.37 24.67 males 24.97 23.45 mod <- aov(winglen ~ sex * spp,data = butter)) summary(mod) mod2 <- lm(winglen ~ sex * spp, data = butter) Df Sum Sq Mean Sq F value Pr(>F) summary(mod2) sex 1 145.16 145.161 9.2717 0.004334 ** Call: spp 1 168.92 168.921 10.7893 0.002280 ** lm(formula = winglen ~ sex * spp, data = butter) sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656 Residuals: --- Min 1Q Median 3Q Max Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 -7.770 -3.095 0.090 2.920 6.530 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 31.370 1.251 25.071 < 2e-16 *** sexmales -6.400 1.770 -3.617 0.000907 *** sppF.flappa -6.700 1.770 -3.786 0.000560 *** sexmales:sppF.flappa 5.180 2.503 2.070 0.045692 * The same Each explanatory variable --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.957 on 36 degrees of freedom anova(mod2) Multiple R-squared: 0.4034, Adjusted R-squared: 0.3537 Analysis of Variance Table F-statistic: 8.115 on 3 and 36 DF, p-value: 0.0002949 Response: winglen Df Sum Sq Mean Sq F value Pr(>F) sex 1 145.16 145.161 9.2717 0.004334 ** Whole model spp 1 168.92 168.921 10.7893 0.002280 ** sex:spp 1 67.08 67.081 4.2846 0.045692 * Residuals 36 563.63 15.656

12 What are linear models? • Response can be continuous, discrete or categorical • Predictors can be continuous or categorical • Type of response and (“errors”), type of predictors and relationship between them determines type of model

13 Key points T-tests, ANOVA and regression are fundamentally the same Collectively ‘general linear model’ mod <- lm( response ~ explanatory1 , data = mydata ) Use summary(mod2) mod <- lm( response ~ explanatory1 * explanatory2 , data = mydata ) summary(mod) anova(mod) Can be extended to ‘generalised linear model’ for different types of response

14 Summary – regression, t-tests and anova are linear models and have the assumptions of classical linear models – summary() output • Intercept estimate is mean of lowest factor level • There is significance test for each estimate • There is a significance test for the model as a whole

Workshop 2 Building from Linear Models to Generalised Linear Models - PowerPoint PPT Presentation

1 Workshop 2 Building from Linear Models to Generalised Linear Models Part 1: understanding LMs 2 What are linear models? Something you have met already! Model to explain, with a linear relationship, one response variable with one or

GIT WORKSHOP GIT WORKSHOP 1 . 1 GIT WORKSHOP GIT WORKSHOP Manuela Salvucci

RAs TLAFs Workshop RAs TLAFs Workshop Dundalk, 26 th July 2010 Objective of the Workshop

ICT Workshop ICT Workshop ICT Workshop ICT Workshop Aims For The Afternoon: Aims

Watershed Planning Watershed Planning Workshop Workshop Workshop Workshop Upper Upper

WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for WORLD WIDE WORKSHOP for

EMA EFPIA workshop EMA EFPIA workshop EMA EFPIA workshop EMA EFPIA workshop Break Break- -out

PACE OF DEVELOPMENT Council Workshop Council Workshop Council Workshop Council Workshop

COMMUNITY WORKSHOP COMMUNITY WORKSHOP 2 DECEMBER 4, 2014 COMMUNITY WORKSHOP 1 BACKGROUND +

HOW TO GET FUNDING WORKSHOP WORKSHOP October 12 2012 October 12, 2012 Workshop Schedule

Workshop Workshop Economic Analysis Economic Analysis Scoping Plan Workshop Scoping Plan

x64 Workshop Didier Stevens Go to http://workshop-x64.DidierStevens.com Unzip x64-workshop.zip

Victoria Dec. 14, 2011 ATLAS CMS TRIUMF Workshop on LHC Results TRIUMF Workshop on LHC

Go to http://workshop.DidierStevens.com Unzip shellcode-workshop.zip to C:\ Password is workshop

23-25, August, 2017 @ Taipei.TW 2017 Belle II TRG/DAQ workshop An Announcement 0. workshop

Workshop Presentations and Handouts www.missionrcd.org/residential/workshop materials/ 1

Main Street Corridor Vision Plan Community Workshop #2 City of Springfield Community Workshop

Dynamics Basilio Bona DAUIN Politecnico di Torino Semester 1, 2015-16 B. Bona (DAUIN)

GSE101x - Why do GDSE? Rini van Solingen Image by NASA Why do GDSE?

GPU based polytropic star model in the gravitational field of closed binaries Balazs Asztalos

Time-multi-scale parameter identification of models describing material fatigue

Improving Search in Tele-Lecturing: Using Folksonomies as Trigger to Query Semantic Datasets to

Chasing chameleons L. Kraiselburd 1 , 4 , S. Landau 2 , D. Sudarsky 3 , M. Salgado 3 and H.

Being a Market Chameleon Being a Market Chameleon The triangles show a support level, but once

Chameleons Galore Philippe Brax (IPhT CEA-Saclay) IHES Collaboration with C. Burrage, C.