stat 213 anova as multiple regression
play

STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin - PowerPoint PPT Presentation

Outline Last Time One-Way ANOVA as Multiple Regression STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016 Outline Last Time One-Way ANOVA as Multiple Regression Outline Last Time One-Way ANOVA as


  1. Outline Last Time One-Way ANOVA as Multiple Regression STAT 213 ANOVA as Multiple Regression Colin Reimer Dawson Oberlin College 5 April 2016

  2. Outline Last Time One-Way ANOVA as Multiple Regression Outline Last Time One-Way ANOVA as Multiple Regression

  3. Outline Last Time One-Way ANOVA as Multiple Regression Reflection Questions When do you do a neested F -test, and what is the meaning if it is statistically significant? In a nested F -test, is MSE Full calculated from the more complex models or both models? Is nested F -test only for comparison between two models? How do we select models among three or more?

  4. Outline Last Time One-Way ANOVA as Multiple Regression Reflection Questions How do you know when to use interaction terms in polynomial regression? What’s the logic behind mutate() in R?

  5. Outline Last Time One-Way ANOVA as Multiple Regression Reading Quiz A group of middle school students performed an experiment to see whether each of two treatments helps lengthen the shelf life of strawberries: (1) spraying with lemon juice, and (2) puttin the strawberries on paper towels to soak up the extra moisture. They compared these two treatments to a control treatment where they did nothing special to the strawberries. Write down the multiple regression model for this experiment (using indicator variables), and explain what each coefficient represents.

  6. Outline Last Time One-Way ANOVA as Multiple Regression For Thursday • Read: Ch. 4.2 • Write: Finish multicollinearity lab (not to turn in, but we will discuss in class) • Answer: 1. True or False: When selecting a set of predictors from a pool, we should prefer a model that yields a larger Mallow’s C p statistic, all else being equal. 2. Suppose we have six candidate predictor variables that we might use to build a multiple regression model. How many models will we need to consider in total to find the best two-predictor model according to forward selection?

  7. Outline Last Time One-Way ANOVA as Multiple Regression library("mosaic"); library("Stat2Data"); data("Pulse") PulseWithBMI <- mutate( Pulse, BMI = Wgt / Hgt^2 * 703, InvActive = 1 / Active, InvRest = 1 / Rest, Male = 1 - Gender)

  8. Outline Last Time One-Way ANOVA as Multiple Regression Testing multiple (but not all) predictors We can test: • one term at a time ( t -test) H 0 : β k = 0 H 1 : β k � = 0 • all terms at once ( F -test) H 0 : β 1 = β 2 = · · · = β K = 0 H 1 : Some β k � = 0 • What if we want to test a subset of the β s together?

  9. Outline Last Time One-Way ANOVA as Multiple Regression Nested Models If Model B has all the terms in Model A and then some, we say that Model A is nested in Model B Model A: Active = β 0 + β 1 Rest Model B: Active = β 0 + β 1 Rest + β 2 Male + β 3 Male · Rest Model A is nested in Model B

  10. Outline Last Time One-Way ANOVA as Multiple Regression Comparing Nested Models • Is the improved fit for Model B “worth it”? • Some of SS Error for the simpler model moves to SS Model for the complex model. • Nested F -test: is this difference more than we would expect by chance? • H 0 : β K A +1 = · · · = β K B = 0 F Comparison = MS Comparison MSE Full = Increase in SS Model / Increase in d f Model MSE Full

  11. Outline Last Time One-Way ANOVA as Multiple Regression Nested F -test modelA <- lm(Active ~ Rest, data = PulseWithBMI) modelB <- lm(Active ~ Rest + factor(Male) + factor(Male):Rest, data = PulseWithBMI) anova(modelA,modelB) Analysis of Variance Table Model 1: Active ~ Rest Model 2: Active ~ Rest + factor(Male) + factor(Male):Rest Res.Df RSS Df Sum of Sq F Pr(>F) 1 230 51953 2 228 51335 2 617.27 1.3708 0.256

  12. Outline Last Time One-Way ANOVA as Multiple Regression Conclusion of a Nested F -test If the nested F -test comes out significant, we have evidence that the additional predictor variables are collectively useful for predicting the response.

  13. Outline Last Time One-Way ANOVA as Multiple Regression Polynomial Regression We can create “new” predictors from old, e.g.: Y = β 0 + β 1 X + β 2 X 2 + · · · + β p X p  1 , linear     quadratic 2 ,  p = cubic 3 ,     etc. 

  14. Outline Last Time One-Way ANOVA as Multiple Regression R: Three Equivalent Methods library("mosaic"); library("mosaicData"); data("SAT") ## sat = mean SAT score Method 1: Explicit Variable Creation SAT.augmented <- mutate(SAT, frac.squared = frac^2) quadratic.model <- lm(sat ~ frac + frac.squared, data = SAT.augmented) Method 2: Inline transformation (note use of I() ) quadratic.model <- lm(sat ~ frac + I(frac^2), data = SAT.augmented) Method 3: Using poly() to generate polynomials quadratic.model <- lm(sat ~ poly(frac, degree = 2, raw = TRUE), data = SAT.augmented) Call: lm(formula = sat ~ frac + I(frac^2), data = SAT.augmented) Coefficients: (Intercept) frac I(frac^2) 1094.09787 -6.52850 0.05242

  15. Outline Last Time One-Way ANOVA as Multiple Regression Example: State SAT Scores f.hat <- makeFun(quadratic.model) xyplot(sat ~ frac, data = SAT) plotFun(f.hat(frac) ~ frac, plot(quadratic.model, which = 1) add = TRUE) Residuals vs Fitted 60 ● 37 ● ● 40 ● ● ● 1100 ● ● ● ● ● ● ● ● ● ● 20 ● ● ● ● ● 1050 ● ● ● Residuals ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1000 ● ● ● ● ● ● sat ● ● ● ● ● ● ● ● ● −40 ● 950 ● ● ● ● ● ● ● ● 4 ● ● ● ● 900 ● ● ● ● ● ● ● ● 48 ● ● ● ● ● ● −80 ● ● 850 ● 900 950 1000 1050 20 40 60 80 Fitted values frac lm(sat ~ frac + I(frac^2))

  16. Outline Last Time One-Way ANOVA as Multiple Regression Selecting Polynomial Order • Start with a higher-order model, then remove highest order term if not significant. • Repeat until highest order term is significant. • To be safe: nested F -test between final model and highest-order model. • Don’t remove lower order terms even if nonsignificant!

  17. Outline Last Time One-Way ANOVA as Multiple Regression Interaction Terms and Second-Order Models Consider the model: sat = β 0 + β 1 · frac + β 2 · expend + β 3 · frac · expend + ε where expend is state education expenditure per pupil. β 3 represents change in slope for expend for each unit increase in frac (or vice versa)

  18. Outline Last Time One-Way ANOVA as Multiple Regression So many models... • How to decide among all these models? 1. Understand the subject area! Build sensible models. 2. Nested F -tests

  19. Outline Last Time One-Way ANOVA as Multiple Regression One-Way ANOVA as Multiple Regression Worksheet

Recommend


More recommend