BUS41100 Applied Regression Analysis Week 2: Inference for SLR Inference: sampling distributions, testing confidence intervals, and prediction intervals Max H. Farrell The University of Chicago Booth School of Business
Back to House Prices Understand the relationship between price and size . How? Last week we fit a line through a bunch of points: price = 39 + 35 × size . ● 160 ● 140 ● price 120 ● ● ● 100 ● ● ● ● ● ● 80 ● ● 60 ● 1.0 1.5 2.0 2.5 3.0 3.5 size 1
CAPM Another example of conditional distributions: Individual returns given market return. The Capital Asset Pricing Model (CAPM) for asset A relates return R At = V At − V At − 1 to the “market” return, R Mt . V At − 1 In particular, the relationship is given by the regression model R At = α + βR Mt + ε with observations at times t = 1 . . . T (more on ( α, β ) vs ( b 0 , b 1 ) vs ( β 0 , β 1 ) in a minute). When asset A is a mutual fund, this CAPM regression can be used as a performance benchmark for fund managers. 2
> mfund <- read.csv("mfunds.csv", stringsAsFactors=TRUE) > mu <- apply(mfund, 2, mean) > mu drefus fidel keystne Putnminc scudinc 0.006767000 0.004696739 0.006542550 0.005517072 0.004432333 windsor valmrkt tbill 0.010021906 0.006812983 0.005978333 > stdev <- apply(mfund, 2, sd) > stdev drefus fidel keystne Putnminc scudinc 0.047237111 0.056587091 0.084236450 0.030079074 0.035969261 windsor valmrkt tbill 0.048639473 0.048000146 0.002522863 3
> plot(mu, stdev, col=0) > text(x=mu, y=stdev, labels=names(mfund), col=4) keystne 0.08 0.06 fidel windsor valmrkt drefus stdev 0.04 scudinc Putnminc 0.02 0.00 tbill 0.005 0.006 0.007 0.008 0.009 0.010 mu 4
Lets look at just windsor (which dominates the market). > windsor.reg <- lm(mfund$windsor ~ mfund$valmrkt) > plot(mfund$valmrkt, mfund$windsor, pch=20) > abline(windsor.reg, col="green") ● 0.15 ● ● ● ● ● ● ● ● ● ● mfund$windsor ● ● ● ● ● ● ● ● ● 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −0.05 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● b_0 = 0.0036 ● −0.15 b_1 = 0.9357 ● −0.10 −0.05 0.00 0.05 0.10 0.15 mfund$valmrkt 5
What is a good line? Statistics version! In a happy coincidence, the least squares line makes good statistical sense too. To see why, we need a model and we need to remember the conditional distribution. We will also use the model to talk about uncertainty. Okay, so lm(Y ∼ X) makes a great line, but how “likely” is it that our answer is useful? ◮ The concept of a sampling distribution is the fundamental idea in all of statistics, and understanding it is our main job today. 6
Normal Distribution – Quick Review Why do we like the Normal distribution? ◮ Symmetric ◮ Concentration around the mean! → 95% of the data within 2 s.d. ֒ Z 0.025 Z 0.975 95% 2.5% 2.5% −3 sd −2 sd −1 sd mean +1 sd +2 sd +3 sd 7
Simple linear regression (SLR) model ε ∼ N (0 , σ 2 ) Y = β 0 + β 1 X + ε, What’s important? ◮ It is a model, so we are assuming this relationship holds for some fixed but unknown values of β 0 , β 1 . ◮ It is linear. ◮ The error ε is independent & mean zero 1. E [ ε ] = 0 ⇔ E [ Y | X ] = β 0 + β 1 X 2. Fixed but unknown variance σ 2 ; constant over X 3. Most things are approx. Normal (Central Limit Theorem) 4. ε represents anything left, not captured in linear fcn of X ◮ It just works! This is a very robust model for the world. 8
Remember the two types of regression questions: 1. Prediction 2. Model ˆ Y = b 0 + b 1 X Y = β 0 + β 1 X + ε Y = b 0 + b 1 X + e 1. Predicting Y ◮ Best guess for Y given (or “conditional on”) X . 2. Properties of β k ◮ Sign: Does Y go up when X goes up? ◮ Magnitude: By how much? 9
Conditional distributions Regression models are really all about modeling the conditional distribution of Y given X . Why are conditional distributions important? We want to develop models for forecasting. What we are doing is exploiting the information in the conditional distribution of Y given X . The conditional distribution is obtained by “slicing” the point cloud in the scatterplot to obtain the distribution of Y conditional on various ranges of X values. 10
Conditional v. marginal distribution Consider a regression of house price on size: “slice” of data { ● ● ● 400 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 300 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● conditional ● 100 marginal ● ● ● ● ● distribution ● distribution of price given 0.5 1.0 1.5 2.0 2.5 3.0 3.5 of price 3 < size < 3.5 size 400 ● 300 ● price ● 200 ● 100 regression line marg 1 − 1.5 1.5 − 2 2 − 2.5 2.5 − 3 3 − 3.5 11
Key observations from these plots: ◮ Conditional distributions answer the forecasting problem: if I know that a house is between 1 and 1.5 1000 sq.ft., then the conditional distribution (second boxplot) gives me a point forecast (the mean) and prediction interval. ◮ The conditional means (medians) seem to line up along the regression line. ◮ The conditional distributions have much smaller dispersion than the marginal distribution. 12
This suggests two general points: ◮ If X has no forecasting power, then the marginal and conditionals will be the same. ◮ If X has some forecasting information, then conditional means will be different than the marginal or overall mean and the conditional standard deviation of Y given X will be less than the marginal standard deviation of Y . 13
Intuition from an example where X has no predictive power. ● ● ● ● 400 ● House price v. ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● number of stop ● ● ● ● ● ● ● ● ● ● ● ● ● 300 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● signs (Y) within a price ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 200 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● two-block radius ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 100 ● of a house (X) ● ● ● ● ● ● 0 1 2 3 4 # stops See that in this 400 case the 300 marginal and price 200 conditionals are not all that 100 different marg 0 1 2 3 4 14
Before looking at any data, the model specifies ◮ how Y varies with X on average: E [ Y | X ] = β 0 + β 1 X ; i.e. what’s the trend? ◮ and the influence of factors other than X , ε ∼ N (0 , σ 2 ) independently of X . Y ε E [ Y | X ] = β 0 + β 1 X X 15
Recommend
More recommend