simple linear regression
play

Simple linear regression STAT 401A - Statistical Methods for - PowerPoint PPT Presentation

Simple linear regression STAT 401A - Statistical Methods for Research Workers Jarad Niemi Iowa State University October 4, 2013 Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 1 / 9 Model Simple Linear Regression Recall


  1. Simple linear regression STAT 401A - Statistical Methods for Research Workers Jarad Niemi Iowa State University October 4, 2013 Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 1 / 9

  2. Model Simple Linear Regression Recall the One-way ANOVA model: ind ∼ N ( µ i , σ 2 ) Y ij where Y ij is the observation for individual j in group i . The simple linear regression model is ind ∼ N ( β 0 + β 1 X i , σ 2 ) Y i where Y i and X i are the response and explanatory variable, respectively, for individual i . response explanatory outcome covariate Terminology (all of these are equivalent): dependent independent endogenous exogenous Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 2 / 9

  3. Model Telomere length vs years post diagnosis ● 1.6 ● ● ● ● 1.4 ● ● ● ● ● ● Telomere length ● ● ● ● ● ● ● ● ● ● ● ● 1.2 ● ● ●● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● 2 4 6 8 10 12 Years post diagnosis (jittered) Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 3 / 9 R package abd , data set Telomeres

  4. Model Interpretation Interpretation V [ Y i | X i = x ] = σ 2 E [ Y i | X i = x ] = β 0 + β 1 x If X i = 0, then E [ Y i | X i = 0] = β 0 . β 0 is the expected response when the explanatory variable is zero. If X i increases from x to x + 1, then E [ Y i | X i = x + 1] = β 0 + β 1 x + β 1 − E [ Y i | X i = x ] = β 0 + β 1 x = β 1 β 1 is the expected increase in the response for each unit increase in the explanatory variable. σ is the standard deviation of the response for a fixed value of the explanatory variable. Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 4 / 9

  5. Model Estimators Remove the mean: iid ∼ N (0 , σ 2 ) Y i = β 0 + β 1 X i + e i e i So e i = Y i − ( β 0 + β 1 X i ) which we approximate by the residual e i = Y i − (ˆ β 0 + ˆ r i = ˆ β 1 X i ) The least squares, maximum likelihood, and Bayesian estimators are ˆ β 1 = SXY / SXX ˆ = Y − ˆ β 0 β 1 X σ 2 ˆ = SSE / ( n − 2) d.f. = n − 2 = � n SXY i =1 ( X i − X )( Y i − Y ) = � n i =1 ( X i − X )( X i − X ) = � n i =1 ( X i − X ) 2 SXX = � n i =1 r 2 SSE i � n = 1 X i =1 X i n � n = 1 Y i =1 Y i n Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 5 / 9

  6. Model Standard errors How certain are we about ˆ β 0 and ˆ β 1 being equal to β 0 and β 1 ? We quantify this uncertainty using their standard errors: � 2 1 X SE ( β 0 ) = ˆ σ n + d . f . = n − 2 ( n − 1) s 2 X � 1 SE ( β 1 ) = ˆ σ d . f . = n − 2 ( n − 1) s 2 X s 2 = SXX / ( n − 1) X s 2 = SYY / ( n − 1) Y = � n i =1 ( Y i − Y ) 2 SYY = SXY / ( n − 1) correlation coefficient r XY s X s Y R 2 = r 2 = SST − SSE coefficient of determination XY SST = SYY = � n i =1 ( Y i − Y ) 2 SST The coefficient of determination is the percentage of the total response variation explained by the explanatory variable(s). Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 6 / 9

  7. Model Pvalues and confidence intervals Pvalues and confidence interval We can compute two-sided pvalues via � � � � � � � � ˆ ˆ β 0 β 1 � � � � 2 P t n − 2 > and 2 P t n − 2 > � � � � SE ( β 0 ) SE ( β 1 ) � � � � � � � � These test the null hypothesis that the corresponding parameter is zero. We can construct 100(1 − α )% confidence intervals via ˆ ˆ β 0 ± t n − 2 (1 − α/ 2) SE ( β 0 ) and β 1 ± t n − 2 (1 − α/ 2) SE ( β 1 ) These provide ranges of the parameter consistent with the data. Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 7 / 9

  8. Model Pvalues and confidence intervals Telomere length vs years post diagnosis ● 1.6 ● ● ● ● 1.4 ● ● ● ● ● ● Telomere length ● ● ● ● ● ● ● ● ● ● ● ● 1.2 ● ● ●● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● 2 4 6 8 10 12 Years post diagnosis (jittered) Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 8 / 9

  9. Model Pvalues and confidence intervals DATA t; INFILE ’telomeres.csv’ DSD FIRSTOBS=2; INPUT years length; PROC REG DATA=t; MODEL length = years; RUN; The REG Procedure Model: MODEL1 Dependent Variable: length Number of Observations Read 39 Number of Observations Used 39 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 0.22777 0.22777 8.42 0.0062 Error 37 1.00033 0.02704 Corrected Total 38 1.22810 Root MSE 0.16443 R-Square 0.1855 Dependent Mean 1.22026 Adj R-Sq 0.1634 Coeff Var 13.47473 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits Intercept 1 1.36768 0.05721 23.91 <.0001 1.25176 1.48360 years 1 -0.02637 0.00909 -2.90 0.0062 -0.04479 -0.00796 Jarad Niemi (Iowa State) Simple linear regression October 4, 2013 9 / 9

Recommend


More recommend