statistical analysis of corpus data with r
play

Statistical Analysis of Corpus Data with R A short introduction to - PowerPoint PPT Presentation

Statistical Analysis of Corpus Data with R A short introduction to regression and linear models Designed by Marco Baroni 1 and Stefan Evert 2 1 Center for Mind/Brain Sciences (CIMeC) University of Trento 2 Institute of Cognitive Science (IKW)


  1. Statistical Analysis of Corpus Data with R A short introduction to regression and linear models Designed by Marco Baroni 1 and Stefan Evert 2 1 Center for Mind/Brain Sciences (CIMeC) University of Trento 2 Institute of Cognitive Science (IKW) University of Onsabr¨ uck Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 1 / 15

  2. Outline Outline Regression 1 Simple linear regression General linear regression Linear statistical models 2 A statistical model of linear regression Statistical inference Generalised linear models 3 Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 2 / 15

  3. Regression Simple linear regression Linear regression Can random variable Y be predicted from r. v. X ? ☞ focus on linear relationship between variables Linear predictor: Y ≈ β 0 + β 1 · X ◮ β 0 = intercept of regression line ◮ β 1 = slope of regression line Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 3 / 15

  4. Regression Simple linear regression Linear regression Can random variable Y be predicted from r. v. X ? ☞ focus on linear relationship between variables Linear predictor: Y ≈ β 0 + β 1 · X ◮ β 0 = intercept of regression line ◮ β 1 = slope of regression line Least-squares regression minimizes prediction error n � 2 � � Q = y i − ( β 0 + β 1 x i ) i =1 for data points ( x 1 , y 1 ) , . . . , ( x n , y n ) Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 3 / 15

  5. Regression Simple linear regression Simple linear regression Coefficients of least-squares line � n i =1 x i y i − nx n y n ˆ β 1 = � n i =1 x 2 i − nx 2 n β 0 = y n − ˆ ˆ β 1 x n Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 4 / 15

  6. Regression Simple linear regression Simple linear regression Coefficients of least-squares line � n i =1 x i y i − nx n y n ˆ β 1 = � n i =1 x 2 i − nx 2 n β 0 = y n − ˆ ˆ β 1 x n Mathematical derivation of regression coefficients ◮ minimum of Q ( β 0 , β 1 ) satisfies ∂ Q /∂β 0 = ∂ Q /∂β 1 = 0 ◮ leads to normal equations (system of 2 linear equations) n n n X X X ˆ ˜ − 2 y i − ( β 0 + β 1 x i ) = 0 β 0 n + β 1 x i = ➜ y i i =1 i =1 i =1 n n n n X X X x 2 X − 2 ˆ y i − ( β 0 + β 1 x i ) ˜ = 0 x i + β 1 i = β 0 ➜ x i x i y i i =1 i =1 i =1 i =1 ◮ regression coefficients = unique solution ˆ β 0 , ˆ β 1 Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 4 / 15

  7. Regression Simple linear regression The Pearson correlation coefficient Measuring the “goodness of fit” of the linear prediction ◮ variation among observed values of Y = sum of squares S 2 y ◮ closely related to (sample estimate for) variance of Y n � S 2 ( y i − y n ) 2 y = i =1 ◮ residual variation wrt. linear prediction: S 2 resid = Q Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 5 / 15

  8. Regression Simple linear regression The Pearson correlation coefficient Measuring the “goodness of fit” of the linear prediction ◮ variation among observed values of Y = sum of squares S 2 y ◮ closely related to (sample estimate for) variance of Y n � S 2 ( y i − y n ) 2 y = i =1 ◮ residual variation wrt. linear prediction: S 2 resid = Q Pearson correlation = amount of variation “explained” by X R 2 = 1 − S 2 � n i =1 ( y i − β 0 − β 1 x i ) 2 resid = 1 − � n S 2 i =1 ( y i − y n ) 2 y ☞ correlation vs. slope of regression line R 2 = ˆ β 1 ( y ∼ x ) · ˆ β 1 ( x ∼ y ) Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 5 / 15

  9. Regression General linear regression Multiple linear regression Linear regression with multiple predictor variables Y ≈ β 0 + β 1 X 1 + · · · + β k X k minimises n � 2 � � Q = y i − ( β 0 + β 1 x i 1 + · · · + β k x ik ) i =1 � � � � for data points x 11 , . . . , x 1 k , y 1 , . . . , x n 1 , . . . , x nk , y n Multiple linear regression fits n -dimensional hyperplane instead of regression line Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 6 / 15

  10. Regression General linear regression Multiple linear regression: The design matrix Matrix notation of linear regression problem y ≈ Z β “Design matrix” Z of the regression data   1 x 11 x 12 · · · x 1 k 1 x 21 x 22 · · · x 2 k   Z =  . . . .  . . . .   . . . .   1 x n 1 x n 2 · · · x nk � ′ � y = y 1 y 2 . . . y n � ′ � β = β 0 β 1 β 2 . . . β k ☞ A ′ denotes transpose of a matrix; y , β are column vectors Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 7 / 15

  11. Regression General linear regression General linear regression Matrix notation of linear regression problem y ≈ Z β Residual error Q = ( y − Z β ) ′ ( y − Z β ) Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 8 / 15

  12. Regression General linear regression General linear regression Matrix notation of linear regression problem y ≈ Z β Residual error Q = ( y − Z β ) ′ ( y − Z β ) System of normal equations satisfying ∇ β Q = 0: Z ′ Z β = Z ′ y Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 8 / 15

  13. Regression General linear regression General linear regression Matrix notation of linear regression problem y ≈ Z β Residual error Q = ( y − Z β ) ′ ( y − Z β ) System of normal equations satisfying ∇ β Q = 0: Z ′ Z β = Z ′ y Leads to regression coefficients β = ( Z ′ Z ) − 1 Z ′ y ˆ Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 8 / 15

  14. Regression General linear regression General linear regression Predictor variables can also be functions of the observed variables ➜ regression only has to be linear in coefficients β E.g. polynomial regression with design matrix x 2 x k   1 x 1 · · · 1 1 x 2 x k 1 · · · x 2  2 2  Z =  . . . .  . . . .   . . . .   x 2 x k 1 x n · · · n n corresponding to regression model Y ≈ β 0 + β 1 X + β 2 X 2 + · · · + β k X k Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 9 / 15

  15. Linear statistical models A statistical model of linear regression Linear statistical models Linear statistical model ( ǫ = random error) Y = β 0 + β 1 x 1 + · · · + β k x k + ǫ ǫ ∼ N (0 , σ 2 ) ☞ x 1 , . . . , x k are not treated as random variables! ◮ ∼ = “is distributed as”; N ( µ, σ 2 ) = normal distribution Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 10 / 15

  16. Linear statistical models A statistical model of linear regression Linear statistical models Linear statistical model ( ǫ = random error) Y = β 0 + β 1 x 1 + · · · + β k x k + ǫ ǫ ∼ N (0 , σ 2 ) ☞ x 1 , . . . , x k are not treated as random variables! ◮ ∼ = “is distributed as”; N ( µ, σ 2 ) = normal distribution Mathematical notation: β 0 + β 1 x 1 + · · · + β k x k , σ 2 � � Y | x 1 , . . . , x k ∼ N Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 10 / 15

  17. Linear statistical models A statistical model of linear regression Linear statistical models Linear statistical model ( ǫ = random error) Y = β 0 + β 1 x 1 + · · · + β k x k + ǫ ǫ ∼ N (0 , σ 2 ) ☞ x 1 , . . . , x k are not treated as random variables! ◮ ∼ = “is distributed as”; N ( µ, σ 2 ) = normal distribution Mathematical notation: β 0 + β 1 x 1 + · · · + β k x k , σ 2 � � Y | x 1 , . . . , x k ∼ N Assumptions ◮ error terms ǫ i are i.i.d. (independent, same distribution) ◮ error terms follow normal (Gaussian) distributions ◮ equal (but unknown) variance σ 2 = homoscedasticity Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 10 / 15

  18. Linear statistical models A statistical model of linear regression Linear statistical models Probability density function for simple linear model � n � 1 − 1 � ( y i − β 0 − β 1 x i ) 2 Pr ( y | x ) = (2 πσ 2 ) n / 2 · exp 2 σ 2 i =1 ◮ y = ( y 1 , . . . , y n ) = observed values of Y (sample size n ) ◮ x = ( x 1 , . . . , x n ) = observed values of X Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 11 / 15

  19. Linear statistical models A statistical model of linear regression Linear statistical models Probability density function for simple linear model � n � 1 − 1 � ( y i − β 0 − β 1 x i ) 2 Pr ( y | x ) = (2 πσ 2 ) n / 2 · exp 2 σ 2 i =1 ◮ y = ( y 1 , . . . , y n ) = observed values of Y (sample size n ) ◮ x = ( x 1 , . . . , x n ) = observed values of X Log-likelihood has a familiar form: n 1 ( y i − β 0 − β 1 x i ) 2 ∝ Q � log Pr ( y | x ) = C − 2 σ 2 i =1 ➥ MLE parameter estimates ˆ β 0 , ˆ β 1 from linear regression Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 11 / 15

  20. Linear statistical models Statistical inference Statistical inference for linear models Model comparison with ANOVA techniques ◮ Is variance reduced significantly by taking a specific explanatory factor into account? ◮ intuitive: proportion of variance explained (like R 2 ) ◮ mathematical: F statistic ➜ p -value Baroni & Evert (Trento/Osnabr¨ uck) SIGIL: Linear Models 12 / 15

Recommend


More recommend