week 3 finish slr inference then multiple linear
play

Week 3: Finish SLR Inference Then Multiple Linear Regression I. - PowerPoint PPT Presentation

BUS41100 Applied Regression Analysis Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction Intervals II. Polynomials, log transformation, categorical variables, interactions & main effects Max H.


  1. BUS41100 Applied Regression Analysis Week 3: Finish SLR Inference Then Multiple Linear Regression I. Confidence and Prediction Intervals II. Polynomials, log transformation, categorical variables, interactions & main effects Max H. Farrell The University of Chicago Booth School of Business

  2. Quick Recap I. We drew a line through a cloud of points ˆ Y − ˆ Y = b 0 + b 1 X & Y = e ◮ It was a good line because: 1. It minimized the SSE 2. It extracted all linear information 3. It implemented the model II. The regression model helped us understand uncertainty ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X + ε, ◮ Sampling distribution: estimates change as data changes σ 2 b 1 ∼ N ( β 1 , σ 2 σ 2 b 1 ) b 1 = ( n − 1) s 2 x 1

  3. Our work today I. Finish SLR ◮ Put sampling distribution to work ◮ Communicable summaries of uncertainty II. Multiple Linear Regression ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X 1 + β 2 X 2 + ε, ◮ Everything carries over from SLR ◮ Interpretation requires one extra piece 2

  4. Summarizing the sampling distribution Remember the two types of regression questions: 1. Model 2. Prediction ˆ Y = β 0 + β 1 X + ε Y = b 0 + b 1 X Y = b 0 + b 1 X + e 1. Properties of β k ◮ Sign: Does Y go up when X goes up? ◮ Magnitude: By how much? ⇒ A confidence interval captures uncertainty about β 2. Predicting Y ◮ Best guess for Y given (or “conditional on”) X . ⇒ A prediction interval captures uncertainty about Y 3

  5. Confidence Intervals and Testing Suppose we think that the true β j is equal to some value β 0 j (often 0). Does the data support that guess? We can rephrase this in terms of competing hypotheses. (Null) H 0 : β j = β 0 j (Alternative) H 1 : β j � = β 0 j Our hypothesis test will either reject or fail to reject the null hypothesis ◮ If the hypothesis test rejects the null hypothesis, we have statistical support for our claim ◮ Gives only a “yes” or “no” answer! ◮ You choose the “probability” of false rejection: α 4

  6. We use b j for our test about β j . ◮ Reject H 0 if b j is “far” from β 0 j ; assume H 0 when close ◮ What we really care about is: how many standard errors b j is away from β 0 j ◮ standard error = s b 1 , cf σ b 1 z b j = b j − β 0 j H 0 ∼ N (0 , 1) . The t -statistic is this test is s b j “Big” | z β j | makes our guess β 0 j look silly ⇒ reject ◮ If H 0 is true, then P [ | z b j | > 2] < 0 . 05 = α b j − β 0 � � j � � β 0 | z β j | = � > 2 ⇔ j �∈ ( b j ± 2 s b j ) But: � � s b j � 5

  7. Confidence intervals Since b j ∼ N ( β j , σ 2 b j ) , � z α/ 2 < b j − β j � 1 − α = P < z 1 − α/ 2 s b j � � = P β j ∈ ( b j ± z α/ 2 s b j ) Why should we care about confidence intervals? ◮ The confidence interval completely captures the information in the data about the parameter. ◮ Center is your estimate ◮ Length is how sure you are about your estimate ◮ Any value outside would be rejected by a test! 6

  8. Real life or pretend? � � β 1 ∈ ( b 1 ± 2 σ b 1 ) = 95% P or � � β 1 ∈ ( b 1 ± 2 σ b 1 ) P = 0 or 1 ? True β 1 7

  9. Level, Size, and p -values The p -value is P [ | Z | > | z β j | ] . ◮ Test with size/level = p -value almost rejects ◮ CI of level 1 − ( p -value ) just excludes | z β j | Z α 2 Z1 − α 2 −|z β j | |z β j | 1 − α p/2 p/2 Level α p−value 8

  10. Example: revisit the CAPM regression for the Windsor fund. Does Windsor have a non-zero intercept? (i.e., does it make/lose money independent of the market?). H 0 : β 0 = 0 H 1 : β 0 � = 0 ◮ Recall: the intercept estimate b 0 is the stock’s “alpha” > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** > 2*pnorm(-abs(0.003647/.001409)) [1] 0.009643399 We reject the null at α = . 05 , Windsor does have an “alpha” over the market. ◮ Why set α = . 05 ? What about at α = 0 . 01 ? 9

  11. Now let’s ask whether or not Windsor moves in a different way than the market (e.g., is it more conservative?). ◮ Recall that the estimate of the slope b 1 is the “beta” of the stock. This is a rare case where the null hypothesis is not zero: H 0 : β 1 = 1 , Windsor is just the market (+ alpha). H 1 : β 1 � = 1 , Windsor softens or exaggerates market moves. This time, R’s output t / p values are not what we want (why?). > summary(windsor.reg) ## output abbreviated Estimate Std. Error t value Pr(>|t|) (Intercept) 0.003647 0.001409 2.588 0.0105 * mfund$valmrkt 0.935717 0.029150 32.100 <2e-16 *** 10

  12. But we can get the appropriate values easily: ◮ Test and p -value: > b1 <- 0.935717; sb1 <- 0.029150 > zb1 <- (b1 - 1)/sb1 [1] -2.205249 > 2*pnorm(-abs(zb1)) [1] 0.02743665 ◮ Confidence Interval > confint(windsor.reg, level=0.95) 2.5 % 97.5 % (Intercept) 0.000865657 0.006428105 mfund$valmrkt 0.878193149 0.993240873 Reject at α = . 05 , so Windsor softens than the market. ◮ What about other values of α ? confint(windsor.reg, level=0.99) confint(windsor.reg, level=(1-2*pt(-abs(zb1), df=178))) 11

  13. Forecasting & Prediction Intervals The conditional forecasting problem: ◮ Given covariate X f and sample data { X i , Y i } n i =1 , predict the “future” observation Y f . The solution is to use our LS fitted value: ˆ Y f = b 0 + b 1 X f . ◮ That’s the easy bit. The hard (and very important!) part of forecasting is assessing uncertainty about our predictions. One method is to specify a prediction interval ◮ a range of Y values that are likely, given an X value. 12

  14. The least squares line is a prediction rule: Read ˆ Y off the line for a new X . ◮ It’s not a perfect prediction: ˆ Y is what we expect. ● 160 ● ˆ Y 140 ● 120 ● price ● ● 100 ● ● ● ● ● ● 80 ● ● 60 ● 1.0 1.5 2.0 2.5 3.0 3.5 X size 13

  15. If we use ˆ Y f , our prediction error has two pieces e f = Y f − ˆ Y f = Y f − b 0 − b 1 X f Y f E [ Y f | X f ] = β 0 + β 1 X f { ε e f } fit error b 0 + b 1 X f ˆ Y f X f 14

  16. We can decompose e f into two sources of error: ◮ Inherent idiosyncratic randomness (due to ε ). ◮ Estimation error in the intercept and slope (i.e., discrepancy between our line and “the truth”). Y f − ˆ Y f = ( Y f − E [ Y f | X f ]) + E [ Y f | X f ] − ˆ e f = Y f ε f + ( E [ Y f | X f ] − ˆ = Y f ) = ε f + ( β 0 − b 0 ) + ( β 1 − b 1 ) X f . The variance of our prediction error is thus Y f ) = σ 2 + var (ˆ var ( e f ) = var ( ε f ) + var ( E [ Y f | X f ] − ˆ Y f ) 15

  17. From the sampling distributions derived earlier, var (ˆ Y f ) is var ( b 0 ) + X 2 var ( b 0 + b 1 X f ) = f var ( b 1 ) + 2 X f cov( b 0 , b 1 ) n + ( X f − ¯ X ) 2 � 1 � σ 2 = . ( n − 1) s 2 x Replacing σ 2 with s 2 gives the standard error for ˆ Y f . And hence the variance of our predictive error is n + ( X f − ¯ X ) 2 � 1 + 1 � var ( e f ) = σ 2 . ( n − 1) s 2 x 16

  18. Putting it all together, we have that n + ( X f − ¯ X ) 2 � � 1 + 1 �� ˆ Y f , σ 2 Y f ∼ N ( n − 1) s 2 x A (1 − α )100% confidence/prediction interval for Y f is thus � � � n + ( X f − ¯ 1 + 1 X ) 2 b 0 + b 1 X f ± z α/ 2 × s . ( n − 1) s 2 x 17

  19. Looking closer at what we’ll call � n + ( X f − ¯ X ) 2 1 + 1 � s 2 + s 2 s pred = s = fit . ( n − 1) s 2 x A large predictive error variance (high uncertainty) comes from ◮ Large s (i.e., large ε ’s). ◮ Small n (not enough data). ◮ Small s x (not enough observed spread in covariates). ◮ Large ( X f − ¯ X ) . The first three are familiar... what about the last one? 18

  20. For X f far from our ¯ X , the space between lines is magnified ... Y ( X f − ¯ X ) True small Line ( ¯ X, ¯ Y ) point of means Estimated Line ( X f − ¯ X ) large X 19

  21. ⇒ The prediction (conf.) interval needs to widen away from ¯ X 20

  22. Returning to our housing data for an example ... > Xf <- data.frame(size=c(mean(size), 2.5, max(size))) > cbind(Xf,predict(reg, newdata=Xf, interval="prediction")) size fit lwr upr 1 1.853333 104.4667 72.92080 136.0125 2 2.500000 127.3496 95.18501 159.5142 3 3.500000 162.7356 127.36982 198.1013 ◮ interval="prediction" gives lwr and upr , otherwise we just get fit ◮ s pred is not shown in this output 21

  23. We can get s pred from the predict output. > p <- predict(reg, newdata=Xf, se.fit=TRUE) > s <- p$residual.scale > sfit <- p$se.fit > spred <- sqrt(s^2+sfit^2) > b <- reg$coef > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qnorm(.975)*spred[1] [,1] [,2] [,3] [1,] 104.4667 75.84713 133.0862 > b[1] + b[2]*Xf[1,]+ c(0,-1, 1)*qt(.975, df=n-2)*spred[1] [1,] 104.4667 72.92080 136.0125 ◮ Or, we can calculate it by hand [see R code]. ————— � s 2 + s 2 Notice that s pred = fit ; you need to square before summing. 22

  24. Summary Uncertainty matters! Captured by the Sampling Distribution. ◮ Quantifies uncertainty from the data ◮ . . . only within the model, assumed before we see data. ◮ Which factors matter for signal-to-noise? Reporting ◮ Confidence Interval: completely captures the information in the data about the parameter. ◮ Testing/ p -value: only a yes/no answer. ( Don’t abuse p -values ) 23

Recommend


More recommend