ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Multiple Linear Regression Recall: a regression model describes how a dependent variable (or response ) Y is affected, on average, by one or more independent variables (or factors , or covariates ). The general equation is E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 + · · · + β k x k . I shall sometimes write E ( Y ) as E ( Y | x 1 , x 2 , . . . , x k ), to emphasize that E ( Y ) changes with the values of the terms x 1 , x 2 , . . . , x k : E ( Y | x 1 , x 2 , . . . , x k ) = β 0 + β 1 x 1 + β 2 x 2 + · · · + β k x k . 1 / 21 Multiple Linear Regression General Form
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II As always, we can write ǫ = Y − E ( Y ) , or Y = E ( Y ) + ǫ, where the random error ǫ has expected value zero: E ( ǫ ) = E ( ǫ | x 1 , x 2 , . . . , x k ) = 0 . So the general equation can also be written Y = β 0 + β 1 x 1 + β 2 x 2 + · · · + β k x k + ǫ. 2 / 21 Multiple Linear Regression General Form
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Each term on the right hand side may be an independent variable, or a function of one or more independent variables. For instance, E ( Y ) = β 0 + β 1 x + β 2 x 2 has two terms on the right hand side (not counting the intercept β 0 ), but only one independent variable . We write it in the general form as E ( Y ) = β 0 + β 1 x 1 + β 2 x 2 , with x 1 = x and x 2 = x 2 . 3 / 21 Multiple Linear Regression General Form
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Interpreting the parameters: β 0 β 0 is still called the intercept, but now its interpretation is the expected value of Y when all independent variables are zero: β 0 = E ( Y | x 1 = 0 , x 2 = 0 , . . . , x k = 0) . In some cases, these values cannot all be achieved at the same time; in these cases, β 0 has only a hypothetical meaning. 4 / 21 Multiple Linear Regression General Form
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Interpreting the parameters: β i , i > 0 For 1 ≤ i ≤ k , β i measures the change in E ( Y ) as x i increases by 1 with all the other independent variables held fixed . Again, in some cases it is not possible to change one variable and none of the others, so β i may also have only a hypothetical meaning. You will sometimes find, for instance, some β i < 0 when you expect that Y should increase , not decrease , when x i increases. That is usually because, when x i changes, other variables also change. 5 / 21 Multiple Linear Regression General Form
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Quantitative and Qualitative Variables Some variables are measured quantities (i.e., on an interval or ratio scale), and are called quantitative . Others are the result of classification into categories (i.e. on a nominal or ordinal scale), and are called qualitative . Some terms may be functions of independent variables: distance and distance 2 , or sine and cosine of (month / 12). The simplest case is when all variables are quantitative, and no mathematical functions appear: the first-order model. 6 / 21 Multiple Linear Regression General Form
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Example: Grandfather clocks Dependence of auction price of antique clocks on their age, and the number of bidders at the auction. Data for 32 clocks. Get the data and plot them: clocks = read.table("Text/Exercises&Examples/GFCLOCKS.txt", header = TRUE) pairs(clocks[, c("PRICE", "AGE", "NUMBIDS")]) The first-order model is E (PRICE) = β 0 + β 1 × AGE + β 2 × NUMBIDS . 7 / 21 Multiple Linear Regression General Form
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Fitting the model: least squares As in the case k = 1, the most common way of fitting a multiple regression model is by least squares . That is, find ˆ β 0 , ˆ β 1 , . . . , ˆ β k so that y = ˆ β 0 + ˆ β 1 x 1 + . . . ˆ ˆ β k x k minimizes � y i ) 2 . SS E = ( y i − ˆ As noted earlier, other criteria such as � | y i − ˆ y i | are sometimes used instead. 8 / 21 Multiple Linear Regression Fitting the model: least squares
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Calculus leads to k + 1 linear equations in the k + 1 estimates β 0 , ˆ ˆ β 1 , . . . , ˆ β k . These equations are always consistent ; that is, they always have a solution. Usually, they are also non-singular ; that is, the solution is unique. If they are singular, we can find a unique solution by either imposing constraints on the parameters or leaving out redundant variables. 9 / 21 Multiple Linear Regression Fitting the model: least squares
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The equations are: n ˆ � x i , 1 ˆ � x i , k ˆ � β 0 + β 1 + · · · + β k = y i � x i , 1 ˆ � i , 1 ˆ � x i , 1 x i , k ˆ � x 2 β 0 + β 1 + · · · + β k = x i , 1 y i . . . � x i , k ˆ � x i , 1 x i , k ˆ � i , k ˆ � x 2 β 0 + β 1 + · · · + β k = x i , k y i where x i , j is the value in the i th observation of the j th variable, 1 ≤ i ≤ n , 1 ≤ j ≤ k . We usually write these more compactly using matrix notation , and solve them using matrix methods . 10 / 21 Multiple Linear Regression Fitting the model: least squares
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Matrix formulation of least squares Write X for the n × ( k + 1) matrix of values of the independent variables (including a column of 1’s for the intercept): 1 . . . x 1 , 1 x 1 , 2 x 1 , k 1 x 2 , 1 x 2 , 2 . . . x 2 , k X = . . . . . ... . . . . . . . . 1 . . . x n , 1 x n , 2 x n , k Also write y for the n × 1 vector of values of the dependent variable: y 1 y 2 y = . . . . y n 11 / 21 Multiple Linear Regression Fitting the model: least squares
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Finally, write ˆ β for the k × 1 vector of parameter estimates: ˆ β 0 ˆ β 1 ˆ β = . . . ˆ β k Then the equations for the parameter estimates can be written X ′ X ˆ β = X ′ y . 12 / 21 Multiple Linear Regression Fitting the model: least squares
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II The equations are non-singular when ( X ′ X ) − 1 exists, and the solution may be written ˆ β = ( X ′ X ) − 1 X ′ y . However, computing first X ′ X and then its inverse ( X ′ X ) − 1 can lead to large numerical errors. Using a transformation of X such as the QR decomposition or the singular value decomposition gives better numerical performance. 13 / 21 Multiple Linear Regression Fitting the model: least squares
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Model Assumptions No assumptions are needed to find least squares estimates. To use them to make statistical inferences, we need these assumptions: The random errors ǫ 1 , ǫ 2 , . . . , ǫ n are uncorrelated and have common variance σ 2 ; For small sample validity, the random errors are normally distributed, at least approximately. 14 / 21 Multiple Linear Regression Estimating Error Variance
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II As before, we estimate σ 2 using � y i ) 2 . SS E = ( y i − ˆ We can show that E [SS E ] = ( n − p ) σ 2 , where p = k + 1 is the number of β s in the model, so the unbiased estimator is s 2 = SS E = SS E SS E n − p . = n − ( k + 1) . df E 15 / 21 Multiple Linear Regression Estimating Error Variance
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Hypothesis Tests Usually, the first test is an overall test of the model: H 0 : β 1 = β 2 = · · · = β k = 0. H a : at least one β i � = 0 . H 0 asserts that none of the independent variables affects Y ; if this hypothesis is not rejected, the model is worthless. For instance, its predictions perform no better than ¯ y . The test statistic is usually denoted F , and P -values are found from the F -distribution with k and n − p = n − ( k + 1) degrees of freedom. 16 / 21 Multiple Linear Regression Testing the Utility of a Model
ST 430/514 Introduction to Regression Analysis/Statistics for Management and the Social Sciences II Individual parameters may also be tested: H 0 : β i = 0. H a : β i � = 0. The test statistic is ˆ β i t = standard error of ˆ β i It is tested using the t -distribution with n − p degrees of freedom. 17 / 21 Multiple Linear Regression Inferences About Individual Parameters
Recommend
More recommend