Chapter 12: The Regression Line We already know that the regression line goes through the point of ____________ and has slope = _________. The equation for the line is Y = intercept + (slope) X where slope = and intercept = ave Y – (slope)(ave X )
Slope and intercept
The equation can be used to get a prediction by putting in the value for X and getting out the predicted value for Y. The regression equation: Put in X Y = intercept + (slope) X Get out Y
Example 1: Midterm: ave = 65 SD = 16 r = 0.7 Final: ave = 60 SD = 10 Find the equation of the regression line for estimating final exam score from midterm score. Estimate the final exam score for someone who got 50 on the midterm.
Example 2: For the men aged 18-24 in the HANES sample, the relationship between height and systolic blood pressure can be summarized as follows: Average height ≈ 70”, SD ≈ 3” Average b.p. ≈ 124mm, SD ≈ 14mm r = -0.2 Find the equation of the line for estimating blood pressure. Predict the blood pressure of a man who is 68” tall.
Example 3: California men, aged 25-29 in 2005 Education (years) ave = 12.5 SD = 3 Income ave = $30,000 SD = $24,000 r = 0.25 Find the equation of the line for estimating income from education. Estimate the income of a California man with 4 years of education.
California men
Example 4: California women, aged 25-29 in 2005 Education (years) ave = 13 SD = 3.4 Income ave = $18,000 SD = $20,000 r = 0.37 Find the equation of the line for estimating income from education. Estimate the income of a California woman with 4 years of education.
California women
Caution! For an observational study, the regression line describes the data that you see, but it can NOT be relied on for predicting the results of INTERVENTIONS. In other words, we can not treat it as a causal relationship. e.g. For California women, the slope says that ASSOCIATED with each year of education, there is an increase of $2,176 more in income, on average. Going to school for an extra year will not necessarily CAUSE an increase of $2,176 in income. Those who have a 4-year degree earn, on average, ___________ more than those who only completed high school, but getting a degree will not necessarily cause someone’s salary to increase. WHY NOT? CONFOUNDING FACTORS!
Notes We can’t rely on the slope to tell us how y will respond if the investigator changes x unless it is a controlled experiment. In an observational study there are too many confounding factors . Sometimes the intercept will not make sense. For example, it might be negative when we would expect it to be zero or positive. Never use regression to predict outside the range of your data!
“Least Squares” • Among all lines, the one with the smallest r.m.s. error is the regression line. • We call the regression line the “least squares regression line”.
A good regression example: Hooke’s Law Hang a weight on a spring and measure the length of the spring. Hooke said the stretch is proportional to the load. Doubling the load doubles the stretch.
Hooke’s law Slope = .05 cm per kg Intercept = 439.01 cm
Hooke’s law: length = mx + b We found slope = .05 and intercept = 439.01, so • we estimate m by .05 • we estimate b by 439.01
A bad regression example Measure the area and perimeter of these rectangles
The correlation between area and perimeter is r = 0.98! The scatter diagram:
Calculations show r = 0.98 slope = 1.6 intercept = -10.51 area = -10.51 + 1.6(perimeter) ? RIDICULOUS! Regression will not FIND an appropriate model – you have to do the THINKING. Don’t substitute statistics for science!
Recommend
More recommend