Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha - PowerPoint PPT Presentation

Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha Wattanachit/Edited by Bianca Doone Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.

Generalized Linear Models (GLMS) GLMs are extensions or generalization of linear regression models to encompass non-normal response distribution and modeling functions of the mean . - Example for ordinary LM: iid ∼ N (0 , σ 2 ) Y = X β + ǫ, ǫ i The best fit line on the following plot represents E ( Y | X ). 100 80 60 y 40 20 0 x 5 10 15 20

Overview of GLMs ◮ Early versions of GLM were used by Fisher in 1920s and GLM theories were unified in 1970s. ◮ Fairly flexible parametric framework, good at describing relationships and associations between variables ◮ Fairly simple (‘transparent’) and interpretable, but basic GLMs are not generally seen as the best approach for predictions. ◮ Both frequentist and Bayesian methods can be used for parametric and nonparametric models.

GLMs: Parametric vs. Nonparametric Models ◮ Parametric models : Assumes data follow a fixed distribution defined by some parameters. GLMs are examples of parametric models. If assumed model is “close to” the truth, these methods are both accurate and precise. ◮ Nonparametric models : Does not assume data follow a fixed distribution, thus could be a better approach if assumptions are violated.

Components of GLMs 1. Random Component : Response variable Y with N observations from a distribution in the exponential family: ◮ One parameter: f ( y i | θ i ) = a ( θ i ) b ( y i ) exp { y i Q ( θ i ) } ◮ Two parameters: f ( y i | θ i , Φ) = exp { y i θ i − b ( θ i ) + c ( y i , Φ) } , a (Φ) where Φ is fixed for all observations ◮ Q ( θ i ) is the natural parameter 2. Systematic Component : The linear predictor relating η i to X i : ◮ η i = X i β 3. Link Function : Connects random and systematic components ◮ µ i = E ( Y i ) ◮ η i = g ( µ i ) = g ( E ( Y i | X i )) = X i β ◮ g ( µ i ) is the link function of µ i g ( µ ) = µ , called the identity link, has η i = µ i (a linear model for a mean itself).

Example 1: Normal Distribution (with fixed variance) Suppose y i follows a normal distribution with ◮ mean µ i = ˆ y i = E ( Y i | X i ) ◮ fixed variance σ 2 . The pdf is defined as (2 πσ 2 ) exp {− ( y i − µ i ) 2 1 f ( y i | µ i , σ 2 ) = } 2 σ 2 � (2 πσ 2 ) exp {− y 2 2 σ 2 } exp {− µ 2 1 2 σ 2 } exp { 2 y i µ i i i = 2 σ 2 } � ◮ Where: ◮ θ = µ i ◮ a ( µ i ) = exp { − µ 2 2 σ 2 } i ◮ b ( y i ) = exp { − y 2 2 σ 2 } i ◮ Q ( µ i ) = exp { µ i σ 2 }

Example 2: Binomial Logit for binary outcome data ◮ Pr ( Y i = 1) = π i = E ( Y i | X i ) ◮ π i � y i � f ( y i | θ i ) = π y i (1 − π i ) 1 − y i = (1 − π i ) 1 − π i π i � � = (1 − π i ) exp y i log 1 − π i ◮ Where: ◮ θ = π i ◮ a ( π i ) = 1 − π i ◮ b ( y i ) = 1 � � ◮ Q ( π i ) = log π i 1 − π i ◮ The natural parameter Q ( π i ) implies the canonical link � � π i function: logit( π ) = log 1 − π i

Example 3: Poisson for count outcome data ◮ Y i ∼ Pois ( µ i ) ◮ f ( y i | µ i ) = e − µ i µ y i i y i ! = e − µ i � 1 � exp { y i log µ i } y i ◮ Where: ◮ θ = µ i ◮ a ( µ i ) = e − µ i � � ◮ b ( y i ) = 1 y i ◮ Q ( µ i ) = log µ i

Deviance For a particular GLM for observations y = ( y 1 , ..., y N ), let L ( µ | y ) denote the log-likelihood function expressed in terms of the means µ = ( µ 1 , ..., µ N ). The deviance of a Poisson or binomial GLM is D = − 2[ L (ˆ µ | y ) − L ( y | y )] ◮ L ( ˆ µ | y ) denotes the maximum of the log likelihood for y 1 , ..., y n expressed in terms of ˆ µ 1 , ..., ˆ µ n ◮ L ( y | y ) is called a saturated model (a perfect fit where ˆ µ i = y i , representing “best case” scenario). This model is not useful, since it does not provide data reduction. However, it serves as a baseline for comparison with other model fits. ◮ Relationship with LRTs: This is the likelihood-ratio statistic for testing the null hypothesis that the model holds against the general alternative (i.e., the saturated model)

Logistic Regression For “simple” one predictor case where Y i ∼ Bernoulli ( π i ) and Pr ( Y i = 1) = π i : � π i � logit ( π i ) = log 1 − π i = logit ( Pr ( Y i = 1)) = logit ( E [ Y i ]) = g ( E [ Y i ]) = X β = β 0 + β i x i , e X β which implies Pr ( Y i = 1) = 1+ e X β . ◮ g does not have to be a linear function (linear model means linear with respect to β ).

Logistic Regression (Cont.) The graphs below illustrate the correspondence between the linear systematic component and the logit link. The logit transformation restricts the range Y i to be between 0 and 1. 1.0 10 β 1 > 0 β 1 > 0 Log−odds scale, logit( π i ) 0.8 Probability scale, π i 5 0.6 0 0.4 −5 0.2 β 1 < 0 β 1 < 0 −10 0.0 0 5 10 15 20 0 5 10 15 20

Example: Linear Probability vs. Logistic Regression Models ◮ For a binary response, the linear probability model π ( x ) = α + β 1 X 1 + ... + β p X p with independent observations is a GLM with binomial random component and identity link function ◮ Logistic regression model is a GLM with binomial random component and logit link function

Example: Linear Probability vs. Logistic Regression Models (Cont.) An epidemiological survey of 2484 subjects to investigate snoring as a risk factor for heart disease. n<- c (1379, 638, 213, 254) snoring<- rep ( c (0,2,4,5),n) y<- rep ( rep ( c (1,0),4), c (24,1355,35,603,21,192,30,224))

Example: Linear Probability vs. Logistic Regression Models (Cont.) library (MASS) logitreg <- function (x, y, wt = rep (1, length (y)), intercept = T, start = rep (0, p), ...) { fmin <- function (beta, X, y, w) { p <- plogis (X %*% beta) -sum (2 * w * ifelse (y, log (p), log (1 - p))) } gmin <- function (beta, X, y, w) { eta <- X %*% beta; p <- plogis (eta) t ( - 2 * (w *dlogis (eta) * ifelse (y, 1 / p, -1 / (1 - p)))) %*% X } if ( is.null ( dim (x))) dim (x) <- c ( length (x), 1) dn <- dimnames (x)[[2]] if ( !length (dn)) dn <- paste ("Var", 1 :ncol (x), sep="") p <- ncol (x) + intercept if (intercept) {x <- cbind (1, x); dn <- c ("(Intercept)", dn)} if ( is.factor (y)) y <- ( unclass (y) != 1) fit <- optim (start, fmin, gmin, X = x, y = y, w = wt, ...) names (fit $ par) <- dn invisible (fit) } logit.fit<- logitreg (x=snoring, y=y, hessian=T, method="BFGS") logit.fit $ par ## (Intercept) Var1 ## -3.866245 0.397335 - Logistic regression model fit: logit [ˆ π ( x )]= - 3.87 + 0.40x

Example: Linear Probability vs. Logistic Regression Models (Cont.) lpmreg <- function (x, y, wt = rep (1, length (y)), intercept = T, start = rep (0, p), ...) { fmin <- function (beta, X, y, w) { p <- X %*% beta -sum (2 * w * ifelse (y, log (p), log (1 - p))) } gmin <- function (beta, X, y, w) { p <- X %*% beta; t ( - 2 * (w * ifelse (y, 1 / p, -1 / (1 - p)))) %*% X } if ( is.null ( dim (x))) dim (x) <- c ( length (x), 1) dn <- dimnames (x)[[2]] if ( !length (dn)) dn <- paste ("Var", 1 :ncol (x), sep="") p <- ncol (x) + intercept if (intercept) {x <- cbind (1, x); dn <- c ("(Intercept)", dn)} if ( is.factor (y)) y <- ( unclass (y) != 1) fit <- optim (start, fmin, gmin, X = x, y = y, w = wt, ...) names (fit $ par) <- dn invisible (fit) } lpm.fit<- lpmreg (x=snoring, y=y, start= c (.05,.05), hessian=T, method="BFGS") lpm.fit $ par ## (Intercept) Var1 ## 0.01724645 0.01977784 - Linear probability model fit: ˆ π ( x ) = 0.0172 + 0.0198x

Example: Linear Probability vs. Logistic Regression Models (Cont.)

Coefficient Interpretation in Logistic Regression Our goal is to say in words what β j is. Consider logit ( Pr ( Y i = 1)) = β 0 + β 1 X 1 i + β 2 X 2 i + ... ◮ The logit function at X i = k and at one-unit increase k + 1 are given by: logit( Pr ( Y i = 1 | X 1 = k , X 2 = z )) = β 0 + β 1 k + β 2 z logit( Pr ( Y i = 1 | X 1 = k + 1 , X 2 = z )) = β 0 + β 1 ( k + 1) + β 2 z

Coefficient Interpretation in Logistic Regression (Cont.) ◮ Subtracting the first equation from the second: log [ odds ( π i | X 1 = k +1 , X 2 = z )] − log [ odds ( π i | X 1 = k , X 2 = z )] = β 1 ◮ The difference can be expressed as � odds ( π i | X i = k + 1 , X 2 = z ) � log = log odds ratio odds ( π i | X i = k , X 2 = z ) ◮ We can write log OR = β 1 or log OR = e β 1 .

Coefficient Interpretation in Logistic Regression (Cont.) ◮ For continuous X i : For every one-unit increase in X i , the estimated odds of outcome changes by a factor of e β 1 or by [( e β 1 − 1) × 100]%, controlling for other variables ◮ For categorical X i : Group X i has e β 1 times the odds of outcome compared to group X j , controlling for other variables Continuous X Categorical X −5 −5 Log−odds scale, logit( π i ) Log−odds scale, logit( π i ) −6 −6 −7 −7 β 1 comparing ORs −8 −8 −9 −9 −10 −10 k k+1 group A group B 0 1 2 3 4 5 0 1 2 3 4 5

Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha - PowerPoint PPT Presentation

Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha Wattanachit/Edited by Bianca Doone Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.

Lecture 15 GPs for GLMs + Spatial Data 3/20/2018 1 GPs and GLMs 2 Bern (

Lecture 16. Mixed Models Nan Ye School of Mathematics and Physics University of Queensland 1 /

Lecture 14. Nonparametric GLMs (cont.) Nan Ye School of Mathematics and Physics University of

Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / Transcribed by Bing Miu and

Lecture 13. Nonparametric GLMs Nan Ye School of Mathematics and Physics University of Queensland

Generalized Linear Models (GLMs) Jonathan Pillow 1 Example 3: unknown neuron 100 75 (spike

Generalized Linear Models (GLMs) II Jonathan Pillow 1 <latexit

Hedonic g gLMS: a n new scale t that pe permits valid hedonic c comparisons Linda Bartoshuk

CP4: Fitting and Bootstrapping GLMs for Incremental Development Triangles Thomas Hartl, PwC LLP

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A

glms: a Transformative Paradigm for Statistical Practice and Education John Hinde Statistics

Verifying the existence of ML estimates for GLMs Sergio Correia (Federal Reserve Board) Paulo

Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 Review

PS 406 Week 4 Section: Matching and GLMs for Binary Outcomes D.J. Flynn April 23, 2014 D.J.

Sparsity with multi-type Lasso regularized GLMs Sander Devriendt (email:

Probabilistic Graphical Models David Sontag New York University Lecture 13, May 2, 2013 David

Testing Identifiable Kernel P Systems using an X-machine Approach Marian Gheorghe 1 , Florentin

Outline GP hyperparameter inference Priors on GP hyperparameters Benefits of

EBLL Response in Public Housing Units Segment 1: The Basics EBLL Response in Public Housing

TOWARDS A SOLUTION TO THE TOWARDS A SOLUTION TO THE "SAMEAS PROBLEM" "SAMEAS

PRIVACY-PRESERVING ALIBI SYSTEMS Benjamin Davis , Hao Chen, Matthew Franklin University of

The Key to Intelligent Transportation: Identity and Credential Management in Vehicular

Linear Algebra III: vector spaces Math Tools for Neuroscience (NEU 314) Fall 2016 Jonathan

Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha - PowerPoint PPT Presentation

Lecture 6: GLMs Author: Nicholas Reich Transcribed by Nutcha Wattanachit/Edited by Bianca Doone Course: Categorical Data Analysis (BIOSTATS 743) Made available under the Creative Commons Attribution-ShareAlike 4.0 International License.

Lecture 15 GPs for GLMs + Spatial Data 3/20/2018 1 GPs and GLMs 2 Bern (

Lecture 16. Mixed Models Nan Ye School of Mathematics and Physics University of Queensland 1 /

Lecture 14. Nonparametric GLMs (cont.) Nan Ye School of Mathematics and Physics University of

Lecture 7: GLMs: Score equations, Residuals Author: Nick Reich / Transcribed by Bing Miu and

Lecture 13. Nonparametric GLMs Nan Ye School of Mathematics and Physics University of Queensland

Generalized Linear Models (GLMs) Jonathan Pillow 1 Example 3: unknown neuron 100 75 (spike

Generalized Linear Models (GLMs) II Jonathan Pillow 1 &lt;latexit

Hedonic g gLMS: a n new scale t that pe permits valid hedonic c comparisons Linda Bartoshuk

CP4: Fitting and Bootstrapping GLMs for Incremental Development Triangles Thomas Hartl, PwC LLP

Workshop 3 Building from Linear Models to Generalised Linear Models Part 2: GLMs 2 2 What are

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A

glms: a Transformative Paradigm for Statistical Practice and Education John Hinde Statistics

Verifying the existence of ML estimates for GLMs Sergio Correia (Federal Reserve Board) Paulo

Panel GLMs Department of Political Science and Government Aarhus University May 12, 2015 Review

PS 406 Week 4 Section: Matching and GLMs for Binary Outcomes D.J. Flynn April 23, 2014 D.J.

Sparsity with multi-type Lasso regularized GLMs Sander Devriendt (email:

Probabilistic Graphical Models David Sontag New York University Lecture 13, May 2, 2013 David

Testing Identifiable Kernel P Systems using an X-machine Approach Marian Gheorghe 1 , Florentin

Outline GP hyperparameter inference Priors on GP hyperparameters Benefits of

EBLL Response in Public Housing Units Segment 1: The Basics EBLL Response in Public Housing

TOWARDS A SOLUTION TO THE TOWARDS A SOLUTION TO THE &quot;SAMEAS PROBLEM&quot; &quot;SAMEAS

PRIVACY-PRESERVING ALIBI SYSTEMS Benjamin Davis , Hao Chen, Matthew Franklin University of

The Key to Intelligent Transportation: Identity and Credential Management in Vehicular

Linear Algebra III: vector spaces Math Tools for Neuroscience (NEU 314) Fall 2016 Jonathan

Generalized Linear Models (GLMs) II Jonathan Pillow 1 <latexit

TOWARDS A SOLUTION TO THE TOWARDS A SOLUTION TO THE "SAMEAS PROBLEM" "SAMEAS