Goals for this Module The Multiple . . . The Multiple . . . The Partial . . . The Semi-Partial . . . Statistical . . . Multiple Regression Sample Formulas Least Squares . . . James H. Steiger Bias of the Sample R 2 Statistical Tests in . . . February 13, 2006 Regression Diagnostics Home Page Print Title Page JJ II J I Page 1 of 27 Go Back Full Screen Close Quit
Goals for this Module The Multiple . . . The Multiple . . . 1. Goals for this Module The Partial . . . The Semi-Partial . . . In this module, we will discuss: Statistical . . . 1. The general multiple linear regression model. Sample Formulas Least Squares . . . 2. Statistical assumptions of multiple regression Bias of the Sample R 2 3. The “best estimate” of the multiple regression equation Statistical Tests in . . . Regression Diagnostics 4. Statistical tests in multiple regression Home Page 5. Regression diagnostics Print Title Page JJ II J I Page 2 of 27 Go Back Full Screen Close Quit
Goals for this Module The Multiple . . . The Multiple . . . 2. The Multiple Regression Model The Partial . . . The Semi-Partial . . . In bivariate linear regression , we learned to predict a single dependent variable y from a single independent variable x with the equations Statistical . . . Sample Formulas b y = y + " Least Squares . . . = b 1 x + b 0 + " Bias of the Sample R 2 Statistical Tests in . . . In multiple linear regression, we predict the dependent variable from Regression Diagnostics several independent variables x 1 : : : x k using the equation Home Page y = b 1 x 1 + b 2 x 2 + b 3 x 3 + : : : + b k x k + b 0 + " (1) Print Dealing with multiple predictors is considerably more challenging than dealing with only a single predictor. Some of the problems include Title Page JJ II 1. Choosing the best model. In multiple regression, often several di¤er- ent sets of variables perform equally well in predicting a criterion. J I Which set should you use? Page 3 of 27 2. I nteractions between variables . In some cases, independent variables interact, and the regression equation will not be accurate unless this Go Back interaction is taken into account. Full Screen 3. Much greater di¢culty visualizing the regression relationships . With only one independent variable, the regression line can be plotted Close Quit
Goals for this Module The Multiple . . . The Multiple . . . neatly in two dimensions. With two predictors, there is a regression The Partial . . . surface instead of a regression line, and with 3 predictors and one The Semi-Partial . . . criterion, you run out of dimensions for plotting. Statistical . . . 4. Model interpretation becomes substantially more di¢cult . The multi- Sample Formulas ple regression equation changes as each new variable is added to the Least Squares . . . model. Since the regression weights for each variable are modi…ed by Bias of the Sample R 2 the other variables, and hence depend on what is in the model, the Statistical Tests in . . . substantive interpretation of the regression equation is problematic. Regression Diagnostics As an example consider the following data from the Kleinbaum, Kupper Home Page and Miller text on regression analysis. These data show weight, height, Print and age of a random sample of 12 nutritionally de…cient children. Suppose we wish to investigate how weight is related to height and age Title Page for these children. We may want to consider only the simple model JJ II y = b 1 x 1 + b 2 x 2 + b 0 + " J I but we have several other alternatives. For example, we might want to Page 4 of 27 examine both …rst and second order terms for x 1 , in which case our model would be Go Back b 1 x 1 + b 2 x 2 + b 3 x 2 y = 1 + b 0 + " Full Screen b = y + " Close Quit
Goals for this Module The Multiple . . . The Multiple . . . The Partial . . . The Semi-Partial . . . Statistical . . . Sample Formulas WGT( y ) HGT( x 1 ) AGE( x 2 ) Least Squares . . . 64 57 8 Bias of the Sample R 2 71 59 10 Statistical Tests in . . . 53 49 6 Regression Diagnostics 67 62 11 55 51 8 Home Page 58 50 7 Print 77 55 10 57 48 9 Title Page 56 42 10 JJ II 51 42 6 76 61 12 J I 68 57 9 Page 5 of 27 Table 1: Data for 12 children Go Back Full Screen Close Quit
Goals for this Module The Multiple . . . The Multiple . . . Note, however, that this nonlinear model can also be written in the form The Partial . . . The Semi-Partial . . . y = b 1 x 1 + b 2 x 2 + b 3 x 3 + b 0 + " Statistical . . . where x 3 = x 2 1 , and so it can be viewed, in a sense, through the “lens” of Sample Formulas the more basic linear model. Least Squares . . . Bias of the Sample R 2 Statistical Tests in . . . Regression Diagnostics Home Page Print Title Page JJ II J I Page 6 of 27 Go Back Full Screen Close Quit
Goals for this Module The Multiple . . . The Multiple . . . 3. The Multiple Correlation Coe¢cient The Partial . . . The Semi-Partial . . . The correlation between the predicted scores and the criterion scores is called the “multiple correlation coe¢cient,” and is almost universally de- Statistical . . . noted with the value R . Curiously, many writers use this notation whether Sample Formulas a sample or a population value is referred to, which creates some problems Least Squares . . . for some readers. We can eliminate this ambiguity by using either � 2 or Bias of the Sample R 2 R 2 pop to signify the population value. Since R is always positive, and R 2 Statistical Tests in . . . is the “percentage of variance in y accounted for by the predictors” (in Regression Diagnostics the colloquial sense), most discussions center on R 2 rather than R . When Home Page it is necessary for clarity, one can denote the squared multiple correlation as R 2 y j x 1 x 2 to indicate that variates x 1 and x 2 have been included in the Print regression equation. Title Page JJ II J I Page 7 of 27 Go Back Full Screen Close Quit
Goals for this Module The Multiple . . . The Multiple . . . 4. The Partial Correlation Coe¢cient The Partial . . . The Semi-Partial . . . The partial correlation coe¢cient is a measure of the strength of the linear relationship between two variables after the contribution of other variables Statistical . . . has been “partialled out” or “controlled for” using linear regression. We Sample Formulas will use the notation r yx j w 1 ;w 2 ;:::w p to stand for the partial correlation be- Least Squares . . . tween y and x with the w ’s partialled out. This correlation is simply the Bias of the Sample R 2 Pearson correlation between the regression residual " y j w 1 ;w 2 ;:::w p for y with Statistical Tests in . . . the w ’s as predictors and the regression residual " x j w 1 ;w 2 ;:::w p of x with the Regression Diagnostics w ’s as predictors. Home Page Print Title Page JJ II J I Page 8 of 27 Go Back Full Screen Close Quit
Goals for this Module The Multiple . . . The Multiple . . . 5. The Semi-Partial (Part) Correlation The Partial . . . The Semi-Partial . . . This is similar to the partial correlation, except that the variables “con- Statistical . . . trolled for” are only partialled out of one of the two variables. We use the notation r Y ( X 1 j X 2 ) to stand for the correlation between y and the residual Sample Formulas of x 1 after x 2 has been partialled from it. Least Squares . . . Bias of the Sample R 2 Statistical Tests in . . . Regression Diagnostics Home Page Print Title Page JJ II J I Page 9 of 27 Go Back Full Screen Close Quit
Goals for this Module The Multiple . . . The Multiple . . . 6. Statistical Assumptions of Multiple Re- The Partial . . . gression The Semi-Partial . . . Statistical . . . 1. Homoscedasticity. The conditional variance of y given any speci…c Sample Formulas combination of values of the x 1 : : : x k is the same, i.e., � 2 " Least Squares . . . Bias of the Sample R 2 2. Existence. For each combination of values of the basic independent variables x 1 : : : x k , y is a univariate random variable having a certain Statistical Tests in . . . probability distribution with …nite mean and variance. Regression Diagnostics 3. Independence . The y observations are statistically independent Home Page 4. Linearity. The expected value of y conditional on all speci…c combi- Print nations of values of the x 1 : : : x k is a linear function of the x ’s, and Title Page follows the linear regression rule. For example, if k = 2 , JJ II � y j x 1 = a 1 ;x 2 = a 2 = b 1 a 1 + b 2 a 2 + b 0 J I 5. Normality . The conditional distribution of y for any combination of values of the x 1 : : : x k is normal, or Gaussian. Page 10 of 27 Go Back Note how these assumptions are quite similar to those for the bivari- ate case. Again, the conditional distribution of y given x is simply nor- Full Screen mal, with a mean that may be computed from the regression equation, and a variance that remains constant over all conditional values of x . A Close Quit
Goals for this Module The Multiple . . . The Multiple . . . mnemonic for the above suggested by Kleinbaum, Kupper, and Miller The Partial . . . (1989) in their textbook on regression is HEIL GAUSS. The Semi-Partial . . . Statistical . . . Sample Formulas Least Squares . . . Bias of the Sample R 2 Statistical Tests in . . . Regression Diagnostics Home Page Print Title Page JJ II J I Page 11 of 27 Go Back Full Screen Close Quit
Recommend
More recommend