Linear regression and t-tests Steve Bagley somgen223.stanford.edu 1
Linear regression somgen223.stanford.edu 2
d <- tibble (height = 0 : 5, weight = 0.5 + 0 : 5 + runif (6, -0.5, 0.5)) Create data • In this dataset, weight = 0.5 + height + some random errors. • runif generates random numbers from a uniform distribution. somgen223.stanford.edu 3
geom_smooth (method = "lm", se = FALSE) + expand_limits (y = 0) plot plot <- ggplot (d, aes (height, weight)) + geom_point () + Plot the data 6 4 weight 2 0 0 1 2 3 4 5 height somgen223.stanford.edu 4
1.1150 reg <- lm (weight ~ height, data = d) reg Call : lm (formula = weight ~ height, data = d) Coefficients : (Intercept) height 0.4463 How to do a linear regression • Note use of ~ here: weight ~ height • This is called the formula notation . • The variable on the left is the dependent variable. • The variable on the right is the independent variable. • They should be column names in the data argument. • The result shows the y-intercept and the coefficient of the height variable. somgen223.stanford.edu 5
0.1272 --- 0.0247 * height 1.1151 0.0420 26.55 1.2e-05 *** Signif. codes : summary (reg) 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error : 0.1757 on 4 degrees of freedom Multiple R - squared : 0.9944, Adjusted R - squared : 0.9929 3.51 0.4463 p - value : 1.197e-05 4 Call : lm (formula = weight ~ height, data = d) Residuals : 1 2 3 5 (Intercept) 6 -0.01798 0.14520 -0.27676 0.14624 0.04688 -0.04359 Coefficients : Estimate Std. Error t value Pr ( >| t | ) F - statistic : 704.8 on 1 and 4 DF, How to get more information about the regression somgen223.stanford.edu 6
coefficients (reg) (Intercept) height 0.4463429 1.1150495 coefficients (reg)[["(Intercept)"]] [1] 0.4463429 coefficients (reg)[["height"]] [1] 1.115049 How to extract the coefficients • coefficients returns a named vector. • Use [[ ]] to extract the values without the names. somgen223.stanford.edu 7
plot + annotate ("text", x = 1, y = 5, label = sprintf ("y = %.4f + %.4f x", coefficients (reg)[["(Intercept)"]], coefficients (reg)[["height"]])) Add regression line information 6 y = 0.4463 + 1.1150 x 4 weight 2 0 0 1 2 3 4 5 height somgen223.stanford.edu 8
plot + annotate ("text", x = 1, y = 5, label = sprintf ("italic(y) == %.4f + %.4f * italic(x)", coefficients (reg)[["(Intercept)"]], coefficients (reg)[["height"]]), parse = TRUE) Add regression line information (fancy) 6 y = 0.4463 + 1.115 x 4 weight 2 0 0 1 2 3 4 5 height • See ?plotmath for details somgen223.stanford.edu 9
annotate ("text", x = 1, y = 5, label = "e^{pi * i} - 1 == 0", parse = TRUE) plot + Add other information (gratuitously ornate) 6 e π i − 1 = 0 4 weight 2 0 0 1 2 3 4 5 height • See ?plotmath for details somgen223.stanford.edu 10
library (ggpubr) ggscatter (d, x = "height", y = "weight", add = "reg.line", add.params = list (color = "blue")) + stat_regline_equation (label.x = 1, label.y = 5) + stat_cor (label.x = 1, label.y = 4.7) Adding the regression info using package ggpubr 6 y = 0.45 + 1.1 x R = 1 , p = 1.2e-05 4 weight 2 0 1 2 3 4 5 height somgen223.stanford.edu 11
Simple statistical tests somgen223.stanford.edu 12
2 control control 5 12.3 control 4 10.4 control 3 13.6 9.44 control set.seed (13) 1 11.1 control < dbl > < chr > value group # A tibble: 6 x 2 head (d2) rep ("treatment", times = n))) group = c ( rep ("control", times = n), rnorm (n, mean = 11, sd = 2)), d2 <- tibble (value = c ( rnorm (n, mean = 10, sd = 2), n <- 50 6 10.8 Create data • rnorm generates random numbers from a Gaussian distribution. • rep builds a vector by repeating values. somgen223.stanford.edu 13
geom_histogram ( aes (fill = group), position = "dodge", binwidth = 0.5) ggplot (d2, aes (value, color = group)) + Plot the data 6 4 group count control treatment 2 0 7.5 10.0 12.5 15.0 value somgen223.stanford.edu 14
alternative hypothesis : true difference in means is not equal to 0 mean of x mean of y d2_y <- d2 %>% filter (group == "treatment") %>% pull (value) t.test (d2_x, d2_y) Welch Two Sample t - test data : d2_x and d2_y t = -2.247, df = 97.824, p - value = 0.02689 d2_x <- d2 %>% filter (group == "control") %>% pull (value) 9.947163 10.805536 -1.6164644 -0.1002824 sample estimates : Two sample t-test 95 percent confidence interval : • t.test uses vectors, not data frames. somgen223.stanford.edu 15
Recommend
More recommend