Lecture 3: Multivariate Regression Homework review Question C2.4 - PowerPoint PPT Presentation

Lecture 3: Multivariate Regression

Homework review  Question C2.4 ask you to estimate a simple bivariate regression using IQ to predict wages.  In Stata this looks like . reg wage IQ not . reg IQ wage  What does the latter command give you?

Homework review  What is the predicted increase in monthly salary for a 15 point increase in IQ?  Common mistake: 8.3*15 + 117  Why is this wrong?  What is the predicted monthly salary for IQs of 100, 115, 145?

Explaining State Homicide Rates, cont.  Two weeks ago, we modeled state homicide rates as being dependent on one variable: poverty. In reality, we know that state homicide rates depend on numerous variables.  Our estimation of homicide rates using multiple regression will look something like this:            Y X X X i 0 1 i 1 2 i 2 k ik i  This allows us to estimate the “effect” of any one factor while holding “all else constant.”

Explaining State Homicide Rates, cont. The “true” model:           Y E E E R 0 1 1 2 2 i i i p ip i p       E R 0 j ij i  j 1 Our estimation model:            Y X X X i 0 1 i 1 2 i 2 k ik i k        X 0 j ij i  j 1

Explaining State Homicide Rates, cont.  Usually, the independent variables in our estimation model are some subset of the “true” model.  We can rewrite the “true” model in terms of k observed and p-k unobserved variables: p k          Y X E R i 0 j ij j ij i    j 1 j k 1

Explaining State Homicide Rates, cont.  Re- arranging the “true” equation: p k          X ( Y ) E R j ij i 0 j ij i    j 1 j k 1  Re-arranging the estimation equation: k        Y X i i 0 j ij  j 1  And substituting: p            Y Y E R i i 0 i 0 j ij i   j k 1 p         ( ) E R 0 0 j ij i   j k 1

Explaining State Homicide Rates, cont.  This means that the error term in a regression reflects both the random component in the dependent variable, and the impact of all excluded variables.  Variables besides poverty thought to influence homicide rates:  Region, high school graduation, incarceration, unemployment, gun ownership, female headed households, population heterogeneity, income, welfare, law enforcement officers, IQ, smokers, other crime

Explaining State Homicide Rates, example  Recall, in a bivariate regression, we found the following:     E (hom rate ) .973 .475 poverty u i i i  Download multivariate homicide rate data “ murder_multi.dta ” from www.public.asu.edu/~gasweete/crj604/data/  Adding imprisonment rate and rate of female- headed households to the model yields the       following: (hom ) 7.34 .005 .0077 .89 E rate poverty prison femhh u i i i i i

Explaining State Homicide Rates, example  Add imprisonment rate and rate of female- headed households to the regression model predicting homicide rates.  You should get a model like this:       (hom ) 7.34 .005 .0077 .89 E rate poverty prison femhh u i i i i i  What happened to the relationship between poverty and homicide? Why?  What does it mean that our intercept is now - 7.34?

Explaining State Homicide Rates, example       (hom ) 7.34 .005 .0077 .89 E rate poverty prison femhh u i i i i i  Of the three predictors in our model, which is the “strongest”?  Poverty is no longer statistically significant. How precise is our estimate of the poverty effect? Hint: what is the 95% confidence interval?  Does this interval contain large effects. Another hint: what is the 95% confidence interval for the standardized coefficient?

Explaining State Homicide Rates, example  In the bivariate regression, imprisonment rates and rates of female-headed households were in the error term, and assumed to be uncorrelated with poverty rates.  This assumption was false. In fact, explicitly controlling for just these two variables reduces the estimate for the effect of poverty on homicide rates from .475 to -.005

Explaining State Homicide Rates, example       E (hom rate ) 7.34 .005 poverty .0077 prison .89 femhh u i i i i i  It’s important to know how to interpret the regression results.  -7.34 is the expected homicide rate if poverty rates, imprisonment rates, and female-headed household rates were zero. This is never the case, so it’s not a meaningful estimate.  .0077 is the effect of a 1 point increase in the imprisonment rate on the homicide rate, holding poverty and femhh constant.  .89 is the effect of a 1 point increase in the female- headed household rate on the homicide rate, holding poverty and prison constant.  See Wooldridge pp. 78-9 (partialling out)

Explaining State Homicide Rates, example       E (hom rate ) 7.34 .005 poverty .0077 prison .89 femhh u i i i i i  Is the effect of female-headed households 115 times bigger than the effect of the imprisonment rate?  prison : mean=404, s.d.=141  femhh : mean=10.2, s.d.=1.4  Because the standard deviation of prison is 100 times larger than femhh , it’s not easy to directly compare the two estimates, unless we calculate standardized effects:  prison : .422, femhh : .499

Explaining State Homicide Rates, example       E (hom rate ) 7.34 .005 poverty .0077 prison .89 femhh u i i i i i  The fitted value (or predicted value) for each state is the expected homicide rate given the poverty, imprisonment and female-headed household rate.  For Arizona: rate      E (hom ) 7.34 .005*15.2 .0077*529 .89*10.06 i      7.34 .076 4.07 8.95  5.60

Explaining State Homicide Rates, example       E (hom rate ) 7.34 .005 poverty .0077 prison .89 femhh u i i i i i  The actual homicide rate in Arizona was 7.5, so the residual is 1.9      ˆ u y y 7.5 5.6 1.9 i i i  That’s just one of 50 residuals. The sum of all residuals is zero.  The sum of the squares of all residuals is as small as possible. That’s how the estimates are chosen

Explaining State Homicide Rates, example  Rather than calculating the predicted values and residuals “by hand”, you can have Stata do it:  For predicted values, after your regression model (“ homhat ” is the name of the new variable. It can be anything you want to call it.):  For residuals (again, “ resid ” can be anything):

Explaining State Homicide Rates, example  You can also estimate predicted values for hypothetical cases.  For example, if we wanted to look at the “average state”:

Explaining State Homicide Rates, example

Explaining State Homicide Rates, example  We can also look at a more disadvantaged hypothetical state:  Or an unusual state, where poverty and imprisonment rates are low but female headed household rate is high:  Is this last prediction reasonable?

Explaining State Homicide Rates, example 14 ? 12 10 8 5 10 15 20 poverty

R 2  Estimating and interpreting R 2 remains the same in multivariate regression.     2 ˆ i y y SSE   2 R     2 SST y y i  As more variables are included in the model, R 2 will either stay the same or increase.  One danger is overfitting, where variables are included in the model that are “explaining” noise or random error in the dependent variable

R 2 , example . reg hom pov Source | SS df MS Number of obs = 50 -------------+------------------------------ F( 1, 48) = 21.36 Model | 100.175656 1 100.175656 Prob > F = 0.0000 Residual | 225.109343 48 4.68977798 R-squared = 0.3080 -------------+------------------------------ Adj R-squared = 0.2935 Total | 325.284999 49 6.63846936 Root MSE = 2.1656 ------------------------------------------------------------------------------ homrate | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- poverty | .475025 .1027807 4.62 0.000 .2683706 .6816795 _cons | -.9730529 1.279803 -0.76 0.451 -3.54627 1.600164 ------------------------------------------------------------------------------

Lecture 3: Multivariate Regression Homework review Question C2.4 - PowerPoint PPT Presentation

Lecture 3: Multivariate Regression Homework review Question C2.4 ask you to estimate a simple bivariate regression using IQ to predict wages. In Stata this looks like . reg wage IQ not . reg IQ wage What does the latter command

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Homework and Exams Homework Context Free Languages Return Homework #2 Homework #3

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Homework Homework Context Free Languages Return Homework #2 Homework #3 Due today

Homework Homework #1 returned today Kleene Theorem Homework #2 due today Homework

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2

Homework Homework #5 returned Turing Machines Homework #6 due today Homework #7

Homework Homework #2 returned Context Free Languages Homework #3 returned today (for early

Homework Homework #3 returned Chomsky Normal Form Homework #4 due today Homework #5

Homework Homework #2 returned Context Free Languages Homework #3 due today Homework #4

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Affine variety codes are better than their reputation Olav Geil Aalborg University (joint with

Towards an Open-Source, Formally-Verified Secure Processor Srini Devadas Massachusetts

Outline of Program Universal Designers & Consultants, Inc. 1. Tools UDConsultants.com

= Languages Accepted by NPDAs Accept (Grammars) NPDAs Context-Free Languages 1 Proof - Step

IEEE-SA P2020 Kick-Off Meeting 20 September 2016, AutoWorld, Brussels, Belgium Agenda Call

A Quantum Quench of the Sachdev-Ye-Kitaev Model Julia Steinberg Harvard University

Bounds on Deviation average IQ = 100. Markov Bound What fraction of the people can possibly have

Defining the Chief Innovation Officer Role CINO Compensation 2018, USA Law Firm Base Salary

Sambuz

Useful Links

Newsletter

Mail Us

Lecture 3: Multivariate Regression Homework review Question C2.4 - PowerPoint PPT Presentation

Lecture 3: Multivariate Regression Homework review Question C2.4 ask you to estimate a simple bivariate regression using IQ to predict wages. In Stata this looks like . reg wage IQ not . reg IQ wage What does the latter command

Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2

Multivariate Linear Regression Max Turgeon STAT 4690Applied Multivariate Analysis

Homework and Exams Homework Context Free Languages Return Homework #2 Homework #3

Regression Diagnostics and the Forward Search 3. A Single Multivariate Sample Anthony Atkinson,

Homework Homework Context Free Languages Return Homework #2 Homework #3 Due today

Homework Homework #1 returned today Kleene Theorem Homework #2 due today Homework

Ensembled Multivariate Adaptive Regression Splines Ensembled Multivariate Adaptive Regression

Multivariate Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven Marc

Multivariate t-distributions Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Reading multivariate data Surajit Ray Reader, University of Glasgow DataCamp Multivariate

Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2

Homework Homework #5 returned Turing Machines Homework #6 due today Homework #7

Homework Homework #2 returned Context Free Languages Homework #3 returned today (for early

Homework Homework #3 returned Chomsky Normal Form Homework #4 due today Homework #5

Homework Homework #2 returned Context Free Languages Homework #3 due today Homework #4

Regression 3: Logistic Regression Marco Baroni Practical Statistics in R Outline Logistic

Affine variety codes are better than their reputation Olav Geil Aalborg University (joint with

Towards an Open-Source, Formally-Verified Secure Processor Srini Devadas Massachusetts

Outline of Program Universal Designers &amp; Consultants, Inc. 1. Tools UDConsultants.com

= Languages Accepted by NPDAs Accept (Grammars) NPDAs Context-Free Languages 1 Proof - Step

IEEE-SA P2020 Kick-Off Meeting 20 September 2016, AutoWorld, Brussels, Belgium Agenda Call

A Quantum Quench of the Sachdev-Ye-Kitaev Model Julia Steinberg Harvard University

Bounds on Deviation average IQ = 100. Markov Bound What fraction of the people can possibly have

Defining the Chief Innovation Officer Role CINO Compensation 2018, USA Law Firm Base Salary

Sambuz

Useful Links

Newsletter

Mail Us

Outline of Program Universal Designers & Consultants, Inc. 1. Tools UDConsultants.com