lecture 1 from linear regression
play

Lecture 1. From Linear Regression Nan Ye School of Mathematics and - PowerPoint PPT Presentation

Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of Queensland 1 / 20 Quiz Q1. Which dataset is linear regression of y against x suitable for? y y 15 1


  1. Lecture 1. From Linear Regression Nan Ye School of Mathematics and Physics University of Queensland 1 / 20

  2. Quiz Q1. Which dataset is linear regression of y against x suitable for? y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 (a) (b) y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) (d) 2 / 20

  3. Q2. If there is a unique least squares regression line y = β ⊤ x on ( x 1 , y 1 ) , . . . , ( x n , y n ) ∈ R d × R, what is β ? ( X ⊤ X ) − 1 X ⊤ y ( XX ⊤ ) − 1 Xy (a) (b) X ⊤ y (c) (d) Xy where X is the n × d design matrix with x i as the i -th row, and y = ( y 1 , . . . , y n ) ⊤ . y 15 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 ● ● ● 3 / 20

  4. Q3. Suggest possible models for the data shown in the figures. y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Linear regression ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 (a) Continuous (b) Binary y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) Cardinal (d) Nonnegative continuous 4 / 20

  5. Q3. Suggest possible models for the data shown in the figures. y y 15 1 ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● Linear regression ● ● ●● ●● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● x ● ● ● 0 ● ● −20 ● 0 10 30 50 x ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −45 −30 −15 −5 5 15 We will study some options in this course! (a) Continuous (b) Binary y y 15 ● ● ● 6 10 ● 4 ● ● ● ● ● ● ● ● ● ● ● 5 ● 2 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● x ● ● x ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −4 −2 0 2 4 6 −10 −6 −2 0 2 4 6 8 (c) Cardinal (d) Nonnegative continuous 4 / 20

  6. Your Tasks Assignment 4 14% out 18 Sep, due 12pm 2 Oct Assignment 5 14% out 2 Oct, due 12pm 16 Oct Consulting Project project description + data, out 2.5% half-time check, due 6pm 1 Oct 7.5% seminar, during a lecture in the week of 22 Oct 20% report, due 6pm on 26 Oct There are bonus questions in lectures and assignments. 5 / 20

  7. Our Problem Regression 6 / 20

  8. Course Objective • Understand the general theory of generalized linear models model structure, parameter estimation, asymptotic normality, prediction • Be able to recognize and apply generalized linear models and extensions for regression on different types of data • Be able to determine the goodness of fit and the prediction quality of a model Put it simply, to be able to do regression using generalized linear models and extensions... 7 / 20

  9. Course Overview Generalized linear models (GLMs) • Building blocks systematic and random components, exponential familes • Prediction and parameter estimation • Specific models for different types of data continuous response, binary response, count response... • Modelling process and model diagnostics Extensions of GLMs • Quasi-likelihood models • Nonparametric models • Mixed models and marginal models Time series 8 / 20

  10. This Lecture • Revisit basics of OLS • Systematic and random components of OLS • Extensions of OLS to other types of data • A glimpse on generalized linear models 9 / 20

  11. Revisiting OLS The objective function Ordinary least squares (OLS) finds a hyperplane minimizing the sum of squared errors (SSE) n ∑︂ ( x ⊤ i β − y i ) 2 , β n = arg min β ∈ R d i =1 where each x i ∈ R d and each y i ∈ R . Terminology x : input, independent variables, covariate vector, observation, predictors, explanatory variables, features. y: output, dependent variable, response. 10 / 20

  12. Solution The solution to OLS is β n = ( X ⊤ X ) − 1 X ⊤ y , where X is the n × d design matrix with x i as the i -th row, and y = ( y 1 , . . . , y n ) ⊤ . The formula holds when X ⊤ X is non-singular. When X ⊤ X is singular, there are infinitely many possible values for β n . They can be obtained by solving the linear systems ( X ⊤ X ) β = X ⊤ y . 11 / 20

  13. Justification as MLE ind ∼ N ( x ⊤ i β, σ 2 ). • Assumption: y i | x i • Derivation: the log-likelihood of β is given by ln p ( y 1 , . . . , y n | x 1 , . . . , x n , β ) ∑︂ = ln p ( y i | x i , β ) i (︃ )︃ 1 ∑︂ exp( − ( y i − x ⊤ β ) 2 / 2 σ 2 ) = ln √ 2 πσ i = const. − 1 ∑︂ ( y i − x ⊤ i β ) 2 . σ 2 i Thus minimizing the SSE is the same as maximizing the log-likelihood, i.e. maximum likelihood estimation (MLE). 12 / 20

  14. An Alternative View • OLS has two orthogonal components E ( Y | x ) = β ⊤ x . (systematic) Y | x is normally distributed with variance σ 2 . (random) • This has two key features • Expected value of Y given x is a function of β ⊤ x . • Parameters of the conditional distribution of Y given x can be determined from E ( Y | x ). • This defines a conditional distribution p ( y | x , β ), with parameters estimated using MLE. 13 / 20

  15. Generalization E ( Y | x ) = g ( β ⊤ x ) . (systematic) (random) Y | x is normally/Poisson/Bernoulli/... distributed . 14 / 20

  16. Example 1. Logistic regression for binary response • When Y takes value 0 or 1, we can use the logistic function to squash x ⊤ β to [0 , 1], and use the Bernoulli distribution to model Y | x , as follows. 1 E ( Y | x ) = logistic ( β ⊤ x ) = (systematic) 1 + e − β ⊤ x . (random) Y | x is Bernoulli distributed . • Or more compactly, (︃ 1 )︃ Y | x ∼ B , 1 + e − β ⊤ x where B ( p ) is the Bernoulli distribution with parameter p . 15 / 20

  17. Example 2. Poisson regression for count response • When Y is a count, we can use exponentiation to map β ⊤ x to a non-negative value, and use the Poisson distribution to model Y | x , as follows. E ( Y | x ) = exp( β ⊤ x ) . (systematic) (random) Y | x is Poisson distributed . • Or more compactly, (︂ )︂ exp( β ⊤ x ) Y | x ∼ Po , where Po ( λ ) is a Poisson distribution with parameter λ . 16 / 20

  18. Example 3. Gamma regression for non-negative response • When Y is a non-negative continuous random variable, we can choose the systematic and random components as follows. E ( Y | x ) = exp( β ⊤ x ) (systematic) (random) Y | x is Gamma distributed . • We further assume the variance of the Gamma distribution is µ 2 /ν ( ν treated as known), thus Y | x ∼ Γ( µ = exp( β ⊤ x ) , var = µ 2 /ν ) , where Γ( µ = a , var = b ) denotes a Gamma distribution with mean a and variance b . 17 / 20

Recommend


More recommend