Linear regression and t-tests Steve Bagley somgen223.stanford.edu - PowerPoint PPT Presentation

Linear regression and t-tests Steve Bagley somgen223.stanford.edu 1

Linear regression somgen223.stanford.edu 2

d <- tibble (height = 0 : 5, weight = 0.5 + 0 : 5 + runif (6, -0.5, 0.5)) Create data • In this dataset, weight = 0.5 + height + some random errors. • runif generates random numbers from a uniform distribution. somgen223.stanford.edu 3

geom_smooth (method = "lm", se = FALSE) + expand_limits (y = 0) plot plot <- ggplot (d, aes (height, weight)) + geom_point () + Plot the data 6 4 weight 2 0 0 1 2 3 4 5 height somgen223.stanford.edu 4

1.1150 reg <- lm (weight ~ height, data = d) reg Call : lm (formula = weight ~ height, data = d) Coefficients : (Intercept) height 0.4463 How to do a linear regression • Note use of ~ here: weight ~ height • This is called the formula notation . • The variable on the left is the dependent variable. • The variable on the right is the independent variable. • They should be column names in the data argument. • The result shows the y-intercept and the coefficient of the height variable. somgen223.stanford.edu 5

0.1272 --- 0.0247 * height 1.1151 0.0420 26.55 1.2e-05 *** Signif. codes : summary (reg) 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error : 0.1757 on 4 degrees of freedom Multiple R - squared : 0.9944, Adjusted R - squared : 0.9929 3.51 0.4463 p - value : 1.197e-05 4 Call : lm (formula = weight ~ height, data = d) Residuals : 1 2 3 5 (Intercept) 6 -0.01798 0.14520 -0.27676 0.14624 0.04688 -0.04359 Coefficients : Estimate Std. Error t value Pr ( >| t | ) F - statistic : 704.8 on 1 and 4 DF, How to get more information about the regression somgen223.stanford.edu 6

coefficients (reg) (Intercept) height 0.4463429 1.1150495 coefficients (reg)[["(Intercept)"]] [1] 0.4463429 coefficients (reg)[["height"]] [1] 1.115049 How to extract the coefficients • coefficients returns a named vector. • Use [[ ]] to extract the values without the names. somgen223.stanford.edu 7

plot + annotate ("text", x = 1, y = 5, label = sprintf ("y = %.4f + %.4f x", coefficients (reg)[["(Intercept)"]], coefficients (reg)[["height"]])) Add regression line information 6 y = 0.4463 + 1.1150 x 4 weight 2 0 0 1 2 3 4 5 height somgen223.stanford.edu 8

plot + annotate ("text", x = 1, y = 5, label = sprintf ("italic(y) == %.4f + %.4f * italic(x)", coefficients (reg)[["(Intercept)"]], coefficients (reg)[["height"]]), parse = TRUE) Add regression line information (fancy) 6 y = 0.4463 + 1.115 x 4 weight 2 0 0 1 2 3 4 5 height • See ?plotmath for details somgen223.stanford.edu 9

annotate ("text", x = 1, y = 5, label = "e^{pi * i} - 1 == 0", parse = TRUE) plot + Add other information (gratuitously ornate) 6 e π i − 1 = 0 4 weight 2 0 0 1 2 3 4 5 height • See ?plotmath for details somgen223.stanford.edu 10

library (ggpubr) ggscatter (d, x = "height", y = "weight", add = "reg.line", add.params = list (color = "blue")) + stat_regline_equation (label.x = 1, label.y = 5) + stat_cor (label.x = 1, label.y = 4.7) Adding the regression info using package ggpubr 6 y = 0.45 + 1.1 x R = 1 , p = 1.2e-05 4 weight 2 0 1 2 3 4 5 height somgen223.stanford.edu 11

Simple statistical tests somgen223.stanford.edu 12

2 control control 5 12.3 control 4 10.4 control 3 13.6 9.44 control set.seed (13) 1 11.1 control < dbl > < chr > value group # A tibble: 6 x 2 head (d2) rep ("treatment", times = n))) group = c ( rep ("control", times = n), rnorm (n, mean = 11, sd = 2)), d2 <- tibble (value = c ( rnorm (n, mean = 10, sd = 2), n <- 50 6 10.8 Create data • rnorm generates random numbers from a Gaussian distribution. • rep builds a vector by repeating values. somgen223.stanford.edu 13

geom_histogram ( aes (fill = group), position = "dodge", binwidth = 0.5) ggplot (d2, aes (value, color = group)) + Plot the data 6 4 group count control treatment 2 0 7.5 10.0 12.5 15.0 value somgen223.stanford.edu 14

alternative hypothesis : true difference in means is not equal to 0 mean of x mean of y d2_y <- d2 %>% filter (group == "treatment") %>% pull (value) t.test (d2_x, d2_y) Welch Two Sample t - test data : d2_x and d2_y t = -2.247, df = 97.824, p - value = 0.02689 d2_x <- d2 %>% filter (group == "control") %>% pull (value) 9.947163 10.805536 -1.6164644 -0.1002824 sample estimates : Two sample t-test 95 percent confidence interval : • t.test uses vectors, not data frames. somgen223.stanford.edu 15

Linear regression and t-tests Steve Bagley somgen223.stanford.edu - PowerPoint PPT Presentation

Linear regression and t-tests Steve Bagley somgen223.stanford.edu 1 Linear regression somgen223.stanford.edu 2 d <- tibble (height = 0 : 5, weight = 0.5 + 0 : 5 + runif (6, -0.5, 0.5)) Create data In this dataset, weight = 0.5 + height

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Logistic regression CS 446 1. Linear classifiers Linear regression Last two lectures, we studied

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

CS70: Lecture 35. Regression (contd.): Linear and Beyond CS70: Lecture 35. Regression (contd.):

Chapter 7 Linear Regression 04/05/2016 Huamei Dong 1. Review Least square regression line 2.

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Lecture 8: Regression Trees Instructor: Saravanan Thirumuruganathan CSE 5334 Saravanan

COMP 204 Control flow - Conditionals Mathieu Blanchette, based on material from Yue Li, Carlos

Advanced R (with Tidyverse) Simon Andrews V2020-11 Course Content Expanding knowledge

COMP 204 Variables Mathieu Blanchette, based on material from Yue Li, Carlos Oliver and

Welcome to summer of nytd! Session starts at 12pm EST Please turn your video off and mute your

CSE 158 Lecture 1.5 Web Mining and Recommender Systems Supervised learning Regression

Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University

Knowledge Compilation Guy Van den Broeck Beyond NP Workshop Feb 12, 2016 Overview 1. Why

Union-Find Problem Given a set {1, 2, , n} of n elements. Initially each element is in