2005 SAMSI Undergraduate Workshop Statistical View of Linear Least Squares Minjung Kyung mkyung@stat.ncsu.edu May 22, 2005 0-0
2005 SAMSI Undergraduate Workshop Introduction to Linear Regression • Functional relation between two variables is expressed by a mathematical formula. – X denotes the independent variable – Y denotes the dependent variable – A functional relation is of the form Y = f ( X ) . – Given a particular value of X , the function f indicates the corresponding value of Y . Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Example of Functional Relation 300 250 200 150 Y 100 50 0 0 50 100 150 X Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Introduction to Linear Regression • Statistical relation between two variables – not a perfect one – in general, the observations for a statistical relation do not fall directly on the curve of relationship Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Scatter Plot Scatter Plot and Line of Statistical Relationship 500 500 400 400 Work Hrs Work Hrs 300 300 200 200 100 100 20 40 60 80 100 120 20 40 60 80 100 120 Lot Size Lot Size y=62.37+3.57x Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Curvilinear Statistical Realtion Example 50 40 Prognosis 30 20 10 0 10 20 30 40 50 60 Days Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Introduction to Linear Regression • A regression model is a formal means of expression the two essential ingredients of a statistical relation: 1. A tendency of the response variable Y to vary with the predictor variable X in a systematic fashion. 2. A scattering of points around the curve of statistical relationship. Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Simple Linear Regression Model Y i = β 0 + β 1 X i + ǫ i • Y i is the value of the response variable in the i th trial • β 0 and β 1 are parameters (the regression coefficients) • X i is the value of the predictor variable in the i th trial • ǫ i is a random error term Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Simple Linear Regression Model Model Assumptions 1. the error terms are normally distributed with mean 0 and variance σ 2 for all values of i ǫ i ∼ N (0 , σ 2 ) 2. the error terms ǫ i and ǫ j are independent if i � = j 3. Although the model explicitly allows for measurement error in Y , measurements made on X are known precisely (there is no measurement error) Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Simple Linear Regression Model Important Features of the Simple Linear Regression Model 1. The response Y i is a sum of 2 components: the deterministic term β 0 + β 1 X i and the random error term ǫ i . Therefore, Y i is a random variable. 2. The response Y i comes from a probability distribution whose mean is E [ Y i ] = β 0 + β 1 X i . 3. The response Y i exceeds or falls short of the value of the regression function by the error term amount ǫ i . 4. The responses Y i have the same constant variance as the error term ǫ i var [ Y i ] = var [ β 0 + β 1 X i + ǫ i ] = σ 2 . Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop 5. The responses Y i and Y j are uncorrelated, since the error terms ǫ i and ǫ j are uncorrelated. In summary, the responses Y i come from normal distribution with mean E [ Y i ] = β 0 + β 1 X i and variance σ 2 , the same for all levels of X . Further, any two responses Y i and Y j are uncorrelated. N ( β 0 + β 1 X i , σ 2 ) Y i ∼ i.i.d Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Steps for Selecting an Appropriate Regression Model 1. Exploratory data analysis 2. Develop one or more tentative regression models 3. Examine and revise the regression models for their appropriateness for the data at hand (or develop new models) 4. Make inferences on basis of the selected regression model Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Estimation of the Regression Function Method of least square • For the observations ( X i , Y i ) , consider the deviation of Y i from its expected value Y i − ( β 0 + β 1 X i ) . • Consider the sum of the squared deviations n � ( Y i − β 0 − β 1 X i ) 2 . Q = (1) i =1 The estimators of β 0 and β 1 are those values � β 0 and � β 1 that minimize Q for the given sample observations ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , . . . , ( X n , Y n ) . Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Estimation of the Regression Function Least square estimator • The estimator � β 0 and � β 1 that satisfy the least squares criterion can be found in 2 ways 1. Numerical Search Procedures 2. Analytical procedures We will use analytical approach. Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Estimation of the Regression Function Least square estimator • The values of β 0 and β 1 that minimize Q can be derived by differentiating (1) with respect to β 0 and β 1 and setting the result equal to 0 n � ∂Q = − 2 ( Y i − β 0 − β 1 X i ) = 0 ∂β 0 i =1 n � ∂Q = − 2 X i ( Y i − β 0 − β 1 X i ) = 0 ∂β 1 i =1 • Simplifying, we get the normal equations n n � � Y i − nβ 0 − β 1 X i = 0 i =1 i =1 Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop n n n � � � X 2 X i Y i − β 0 X i − β 1 i = 0 i =1 i =1 i =1 • The normal equations can be solved simultaneously to get estimates of the parameters β 0 and β 1 � n i =1 ( X i − X )( Y i − Y ) � β 1 = � n i =1 ( X i − X ) 2 � n � n � � β 0 = 1 � Y i − � = Y − � β 1 X i β 1 X n i =1 i =1 where X and Y are the means of the X and Y observations, respectively. Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Estimation of the Regression Function Residuals • The fitted value for the i th case, Y i = � � β 0 + � β 1 X i • The i th residual is the difference between the observed value Y i and the fitted value � Y i e i = Y i − � Y i = Y i − ( � β 0 + � β 1 X i ) . • Model Error Term: ǫ i = Y i − ( β 0 + β 1 X i ) Represents the vertical deviation of Y i from the unknown true regression line. • Residual: e i = Y i − � Y i Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop – Represents the vertical deviation of Y i from the fitted value � Y i on the estimated regression line. – Residuals are useful for studying whether a given regression model is appropriate for the given data. Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Properties of fitted regression line • � n i =1 e i = 0 . • � n i =1 e i , is a minimum. • � n i =1 Y i = � n i =1 � Y i . • � n i =1 X i e i = 0 . • � n i =1 � Y i e i = 0 . • The regression line always goes through the point ( X, Y ) . Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Estimation of σ 2 • A variety of inferences concerning the regression function require an estimate of σ 2 . – To get an estimate of σ 2 , first compute the error sum of squares or residual sum of squares: n n � � Y i ) 2 = ( Y i − � e 2 SSE = i . i =1 i =1 – The mean square error(MSE) is computed as MSE = SSE n − 2 . – It can be shown that MSE is an unbiased estimator of σ 2 . Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Matrix Approach to Least Squares • The regression model Y i = β 0 + β 1 X i + ǫ i can be written in matrix notation as Y = X β + ǫ where Y 1 1 X 1 ǫ 1 Y 1 1 X 2 ǫ 2 β 0 , Y = , X = β = ǫ = . . . . . . . . β 1 . . . . Y n 1 X n ǫ n Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Matrix Approach to Least Squares • The normal equations in matrix form are X ′ X β = X ′ Y • The model parameters can be estimated as follows: β = ( X ′ X ) − 1 X ′ Y � Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Matrix Approach to Least Squares • The residuals are computed using: e = Y − � Y = Y − X � β, where β 0 + � � β 1 X 1 . � . Y = . β 0 + � � β 1 X n • The estimate for σ 2 is computed as follows: e ′ e σ 2 = n − 2 Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Inferences in Regression Analysis Inferences concerning β 1 • Point Estimator of β 1 � n i =1 ( X i − X )( Y i − Y ) � β 1 = � n i =1 ( X i − X ) 2 • Estimate of the standard error of � β 1 MSE SE [ � β 1 ] = � n i =1 ( X i − X ) 2 • Confidence interval for β 1 : β 1 ± t 1 − α/ 2; n − 2 SE [ � � β 1 ] Linear Least Squares SAMSI
2005 SAMSI Undergraduate Workshop Inferences in Regression Analysis Inferences concerning β 1 • To test H 0 : β 1 = 0 vs. H 0 : β 1 � = 0 , – Test statistics � β 1 − 0 t = SE [ � β 1 ] – p-value p ( | t | > T (1 − α/ 2 ,n − 2) ) → if p-value < α , we reject H 0 . Linear Least Squares SAMSI
Recommend
More recommend