STAT 113 Simple Linear Regression Colin Reimer Dawson Oberlin College Sept. 16, 2015
Outline Prediction What’s a Good Prediction? Linear Prediction Equation Prediction Error Regression to the Mean
Prediction ◮ Correlations give us a description of the relationship between two numeric variables. ◮ However, when two variables are related, we can go further and use knowledge of one to make predictions about the other. ◮ Examples: ◮ Use SAT scores to predict college GPA ◮ Use economic indicators to predict stock prices ◮ Use credit score to predict probability of default on a loan ◮ Use biomarkers to predict disease progression ◮ What else?
What’s a Good Prediction? 11 ● 10 ● ◮ Suppose I have this 9 ● data. ● ● 8 ◮ What would be a good ● Y ● ● 7 prediction if I get a new 6 X value of 12? ● 5 ◮ What about an X value ● ● 4 of 5.5? 4 6 8 10 12 14 X
Modeling relationships with a function ◮ We can capture all of our predictions by writing the y variable as a function of the x variable ◮ Examples: ◮ f ( x ) = x 2 ◮ f ( x ) = 1 . 6 x + 20 ◮ f ( x ) = 5 cos(2 πx )
What’s a Good Prediction? 11 ● 10 ● 9 ● ● ● 8 ● Y ● ● 7 How about this function? 6 ● 5 ● ● 4 4 6 8 10 12 14 X
What’s a Good Prediction? 12 ● 10 ● ● ● ● 8 ● ● ● Y 6 ● ● Or this? ● 4 2 0 4 6 8 10 12 14 X
What’s a Good Prediction? 12 ● 10 ● ◮ What about this? ● ● ● 8 ● ● ◮ There’s a tradeoff ● Y 6 ● between how well we ● ● 4 can fit the data and how 2 simple our model (i.e., 0 prediction function) is. 4 6 8 10 12 14 X
What’s a Good Prediction? 12 ● ◮ Pretty much the 10 ● ● ● ● simplest model we can 8 ● ● ● have is a straight line. Y 6 ● ● ◮ Two things determine ● 4 what line we have: 2 ◮ The intercept 0 ◮ The slope 4 6 8 10 12 14 X
Intercept Slope Form ◮ The intercept and slope are the parameters of our regression model. ◮ The general equation for a line is: f ( x ) = a + bx ◮ In statistics notation, we write ˆ y (“y hat”) to represent a predicted (or fitted) value. ◮ Given a value x i , we predict using: a + ˆ y = ˆ ˆ bx i
Hat Notation Figure: Source: brownsharpie.com
Systematic vs. Random ◮ We can split up each y value into two parts: a systematic (predictable) part and a “random” part. ◮ That is, we can write, for the y coordinate of the i th data point: y i = ˆ y i + Error i
What’s a Good Prediction? ● 10 ● ● ● ● 8 ● ● ● Y 6 ● Every line will have a differ- ● ● 4 ent set of errors associated 2 with it. 0 4 6 8 10 12 14 X
What’s a Good Prediction? ● 10 ● ● ● ● 8 ● ● ● Y 6 ● Every line will have a differ- ● ● 4 ent set of errors associated 2 with it. 0 4 6 8 10 12 14 X
What’s a Good Prediction? ◮ Every line will have a ● 10 ● different set of errors ● ● ● 8 ● ● associated with it. ● Y 6 ● ● ◮ Which is best? ● 4 ◮ Intuitively, we want to 2 minimize the overall 0 “distance” between the 4 6 8 10 12 14 line and the points. X
The Prediction Equation Prediction Function a + ˆ ˆ = ˆ y i b 1 x i a and ˆ Pick ˆ b that minimize the total distance. This is a calculus problem that the computer solves for us.
Regression Example 74 ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ◮ The “father of Child's Adult Height (in.) ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● regression”, Francis ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 70 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● Galton, looked at ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● parents’ and children’s ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 66 ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● heights. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ◮ Here’s his data, with ● ● ● ● ● 62 ● ● ● ● ● ● ● ● ● the associated 62 66 70 74 regression line. Mid−parent Height (in.)
Example: Batting Average in Successive Seasons
Figure: Source: https://www.washingtonpost.com/opinions/why-our-childrens-future- no-longer-looks-so-bright/2011/10/14/gIQAofzlpL_story.html
“This fall, Lafley will step down for the second time, and no one will be mentioning Steve Jobs’s legendary return to Apple. Lafley hasn’t been bad – he slimmed the company down, selling off parts and getting out of less profitable businesses – but there’s been no dramatic turnaround. ... In other words, he’s been just O.K. How could someone who, according to Fortune, was known as “an all-time C.E.O. hero” end up being just O.K.? Well, if commentators had looked at the track record of returning C.E.O.s – boomerang C.E.O.s, as they’re sometimes called – that’s precisely what they’d have predicted. A 2014 study found that profitability at companies run by boomerang C.E.O.s fell slightly, and an earlier study detected no significant difference in long-term performance between firms that reappointed a former C.E.O. and ones that hired someone new.” Figure: Source: http://www.newyorker.com/magazine/2015/09/21/the-comeback- conundrum
Recommend
More recommend