STAT 213 Regression Inference II Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability STAT 213 Regression Inference II Colin Reimer Dawson Oberlin College 18 February 2016

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Outline Key Ideas: Last Time Influence and Outliers Regression Inference Simulation Approaches Partitioning Variability

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Reading Quiz A regression equation was fit to a set of data for which the correlation, r , between X and Y was 0.6. Which of the following must be true? (a) The slope of the regression line is 0.6. (b) The regression model explains 60% of the variability in Y . (c) The regression model explains 36% of the variability in Y . (d) At least half of the residuals are smaller than 0.6 in absolute value.

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability For Tuesday... • Write and turn in: Ex. 1.10, 1.12, 1.26, 2.14a, 2.34 • Read: Ch. 2.4, 4.6 • Answer: 1. Exercise 2.5 2. Exercise 2.6 3. In a randomization distribution to test whether a regression slope is significantly different from zero, the P -value is the proportion of obtained by that exceed ?

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Transformations and Outliers Data Transformations Can be used to • Address non-linearity • Stabilize (homogenize) variance • “Unskew” residual distribution • Reduce influence of outliers

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Brain and Body Weight of Terrestrial Mammals library(mosaic) BrainBodyWeight <- read.file("http://colinreimerdawson.com/data/BrainBodyWeight.csv") xyplot( brain.weight.grams ~ body.weight.kilograms, data = BrainBodyWeight, type = c("p", "r")) ● brain.weight.grams 5000 ● 4000 3000 2000 ● 1000 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2000 4000 6000 body.weight.kilograms

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Brain and Body Weight of Terrestrial Mammals brain.model <- lm(brain.weight.grams ~ body.weight.kilograms, data = BrainBodyWeight) par(mfrow = c(1,2)) # to create a 1-by-2 plotting grid plot(brain.model, which = 1) #residuals by predicted plot(brain.model, which = 2) #quantile-quantile Residuals vs Fitted Normal Q−Q 8 Standardized residuals 5 5 ● ● 6 4 34 ● Residuals 1000 34 ● 2 ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● −2 ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1000 ● −6 1 1 ● 0 2000 4000 6000 −2 −1 0 1 2 Fitted values Theoretical Quantiles

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Log Brain and Log Body Weight xyplot( log(brain.weight.grams) ~ log(body.weight.kilograms), data = BrainBodyWeight, type = c("p", "r")) log(brain.weight.grams) ● ● 8 ● ● ● ● 6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● −2 ● −5 0 5 log(body.weight.kilograms)

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Log Brain and Log Body Weight log.brain.model <- lm(log(brain.weight.grams) ~ log(body.weight.kilograms), data = BrainBodyWeight) par(mfrow = c(1,2)) plot(log.brain.model, which = 1) #residuals by predicted plot(log.brain.model, which = 2) #quantile-quantile Residuals vs Fitted Normal Q−Q Standardized residuals 3 2 34 ● 34 ● ● 50 50 ● ● 2 ● ● ● 1 ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● 61 −2 ● 61 −2 0 2 4 6 8 −2 −1 0 1 2 Fitted values Theoretical Quantiles

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Percent Brain Weight by Body Weight library(mosaic) transform( BrainBodyWeight, percent.brain = brain.weight.grams / (body.weight.kilograms * 1000) ) %>% xyplot( log(percent.brain) ~ log(body.weight.kilograms), data = ., type = c("p", "r")) ● ● ● ● log(percent.brain) ● ● ● ● ● −4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −6 ● ● ● ● ● ● ● ● ● ● ● −7 ● ● ● −5 0 5 log(body.weight.kilograms)

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Percent Brain Weight By Body Weight Residuals vs Fitted Normal Q−Q Standardized residuals 3 2 ● 34 34 ● 50 ● 50 ● ● 2 ● ● ● 1 ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 ● 61 −2 61 ● 7.0 8.0 9.0 10.0 −2 −1 0 1 2 Fitted values Theoretical Quantiles

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Unusual Cases Detecting Unusual Cases • Residual plots • Standardized/Studentized residuals • Leverage measurement

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Men’s Long Jump library(Stat2Data) data(LongJumpOlympics) xyplot( Gold ~ Year, data = LongJumpOlympics, type = c("p", "r"), groups = (Year == 1968) ## highlight the outlier ) ● ● ● ● ● ● 8.5 ● ● ● ● Gold ● ● ● 8.0 ● ● ● ● ● ● 7.5 ● ● ● ● ● ● 1900 1920 1940 1960 1980 2000 Year

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Men’s Long Jump: Residuals long.jump.model <- lm(Gold ~ Year, data = LongJumpOlympics) par(mfrow = c(1,2)) plot(long.jump.model, which = 1) plot(long.jump.model, which = 2) Residuals vs Fitted Normal Q−Q 0.8 16 16 ● 3 ● Standardized residuals 2 0.4 Residuals ● ● ● ● 1 ● ● ● ● ● ● ● ● ● ● ● ● 0.0 ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −1 −0.4 ● ● ● 26 12 ● 26 12 ● ● 7.5 8.0 8.5 −2 −1 0 1 2 Fitted values Theoretical Quantiles

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Influence Two characteristics contribute to influence of a data point on regression line: 1. Distance in Y from trend (think: residual for line fit w/o that point) 2. Distance of X from ¯ X (think: distance from center on a see-saw)

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability Standardized and Studentized Residuals Standardized Residuals y i − ˆ y i (1) σ ε ˆ “Studentized” Residuals y i − ˆ y i (2) σ ( i ) ˆ ε σ ( i ) where ˆ is standard deviation of all residuals other ε than i .

STAT 213 Regression Inference II Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability STAT 213 Regression Inference II Colin Reimer Dawson Oberlin College 18 February 2016 Outline Key Ideas: Last Time Simulation Approaches Partitioning Variability

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

STAT 213 Interactions in Multiple Regression Colin Reimer Dawson Oberlin College 29 March 2016

Slide 4 / 213 Slide 4 (Answer) / 213 Slide 5 / 213 Derivatives Exploration Exploration into the

STAT 215 Regression Inference Colin Reimer Dawson Oberlin College October 12, 2017 1 / 26

M8S1 - Regression Inference Professor Jarad Niemi STAT 226 - Iowa State University November 29,

STAT 213 Logistic Regression II Colin Reimer Dawson Oberlin College 28 April 2016 Outline

STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016 Outline

STAT 113 Analytic Inference for Regression Colin Reimer Dawson Oberlin College 21-24 April 2017

STAT 830 Non-parametric Inference Basics Handwritten Notes Richard Lockhart Simon Fraser

STAT 830 Blank Slides for Notes Richard Lockhart SFU STAT 830 Fall 2020 Richard Lockhart

MESSAGE HANDLING MESSAGE HANDLING ICS- -213 213 ICS Presented by Chuck Sprick KE5RAD Feb

Mathematical approximation Jo Hardin Professor, Pomona College DataCamp Inference for Linear

STAT 213 Logistic Regression: Assessment and Testing Colin Reimer Dawson Oberlin College April

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Technical conditions for linear regression Jo Hardin Professor, Pomona College DataCamp

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr

Assessing Model Fit Our model has assumptions: mean 0 errors, functional form of

for scientometrics network analysis Lovro ubelj University of Ljubljana, Faculty of Computer

Probabilistic Graphical Models for Cellular Pathways Florian Markowetz

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Compressed sensing, sparsity and p-values Sara van de Geer April 16, 2015 (Leiden) Dantzig

Implementing Bootstrap Methods in R GETTING STARTED WITH BOOTSTRAPPING IN R Janani Ravi

Probability: Reasoning Under Uncertainty CS171, Winter Quarter, 2019 Introduction to Artificial