Generalized Linear Models David Rosenberg New York University - PowerPoint PPT Presentation

Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20

Conditional Gaussian Regression Gaussian Regression Input space X = R d , Output space Y = R � w T x , σ 2 � Hypothesis space consists of functions f : x �→ N . For each x , f ( x ) returns a particular Gaussian density with variance σ 2 . Choice of w determines the function. For some parameter w ∈ R d , can write our prediction function as [ f w ( x )]( y ) = p w ( y | x ) = N ( y | w T x , σ 2 ) , where σ 2 > 0. Given some i.i.d. data D = { ( x 1 , y 1 ) ,..., ( x n , y n ) } , how to assess the fit? David Rosenberg (New York University) DS-GA 1003 April 12, 2015 2 / 20

Conditional Gaussian Regression Gaussian Regression: Likelihood Scoring Suppose we have data D = { ( x 1 , y 1 ) ,..., ( x n , y n ) } . Compute the model likelihood for D : n � p w ( D ) = p w ( y i | x i ) [by independence] i = 1 Maximum Likelihood Estimation (MLE) finds w maximizing p w ( D ) . Equivalently, maximize the data log-likelihood: n � w ∗ = argmax log p w ( y i | x i ) w ∈ R d i = 1 Let’s start solving this! David Rosenberg (New York University) DS-GA 1003 April 12, 2015 3 / 20

Conditional Gaussian Regression Gaussian Regression: MLE The conditional log-likelhood is: n � log p w ( y i | x i ) i = 1 n −( y i − w T x i ) 2 � � 1 � �� log 2 π exp = √ 2 σ 2 σ i = 1 n n −( y i − w T x i ) 2 � � � � � 1 � = log + √ 2 σ 2 σ 2 π i = 1 i = 1 � �� independent of w MLE is the w where this is maximized. Note that σ 2 is irrelevant to finding the maximizing w . Can drop the negative sign and make it a minimization problem. David Rosenberg (New York University) DS-GA 1003 April 12, 2015 4 / 20

Conditional Gaussian Regression Gaussian Regression: MLE The MLE is n � w ∗ = argmin ( y i − w T x i ) 2 w ∈ R d i = 1 This is exactly the objective function for least squares. From here, can use usual approaches to solve for w ∗ (linear algebra, calculus, iterative methods etc.) NOTE: Parameter vector w only interacts with x by an inner product David Rosenberg (New York University) DS-GA 1003 April 12, 2015 5 / 20

Poisson Regression Poisson Regression: Setup Input space X = R d , Output space Y = { 0 , 1 , 2 , 3 , 4 ,... } Hypothesis space consists of functions f : x �→ Poisson ( λ ( x )) . That is, for each x , f ( x ) returns a Poisson with mean λ ( x ) . What function? Recall λ > 0. GLMs (and Poisson is a special case) have a linear dependence on x . Standard approach is to take w T x � � λ ( x ) = exp , for some parameter vector w . Note that range of λ ( x ) = ( 0 , ∞ ) , (appropriate for the Poisson parameter). David Rosenberg (New York University) DS-GA 1003 April 12, 2015 6 / 20

Poisson Regression Poisson Regression: Likelihood Scoring Suppose we have data D = { ( x 1 , y 1 ) ,..., ( x n , y n ) } . Last time we found the log-likelihood for Poisson was: n � log p ( D , λ ) = [ y i log λ − λ − log ( y i ! )] i = 1 � w T x � Plugging in λ ( x ) = exp , we get n � � � � w T x �� w T x � � log p ( D , λ ) = y i log exp − exp − log ( y i ! ) i = 1 n � y i w T x − exp w T x � � � � = − log ( y i ! ) i = 1 Maximize this w.r.t. w to find the Poisson regression. No closed form for optimum, but it’s concave, so easy to optimize. David Rosenberg (New York University) DS-GA 1003 April 12, 2015 7 / 20

Bernoulli Regression Linear Probabilistic Classifiers Setting: X = R d , Y = { 0 , 1 } For each X = x , p ( Y = 1 | x ) = θ . (i.e. Y has a Bernoulli ( θ ) distribution) θ may vary with x . For each x ∈ R d , just want to predict θ ∈ [ 0 , 1 ] . Two steps: �→ w T x �→ f ( w T x ) x , �� ∈ R D ∈ R ∈ [ 0 , 1 ] where f : R → [ 0 , 1 ] is called the transfer or inverse link function. Probability model is then p ( Y = 1 | x ) = f ( w T x ) David Rosenberg (New York University) DS-GA 1003 April 12, 2015 8 / 20

Generalized Linear Models David Rosenberg New York University - PowerPoint PPT Presentation

Generalized Linear Models David Rosenberg New York University April 12, 2015 David Rosenberg (New York University) DS-GA 1003 April 12, 2015 1 / 20 Conditional Gaussian Regression Gaussian Regression Input space X = R d , Output space Y =

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A

Bias reduction in generalized nonlinear models Ioannis Kosmidis and David Firth Department of

Proper Generalized Decomposition for Linear and Non-Linear Stochastic Models Olivier Le Matre 1

Transportation Commission October 19, 2016 COMMISSION MINUTES: September 2016 Meeting Agenda

TO MEET THE RAPID INCREASE IN DEMAND FOR AVIATION PROFESSIONALS Capt. Bobby MAMAHIT Head of Human

2018 ORIGINAL ADOPTED BUDGET B O CC DIRECTION Nicola Sapp Deputy County Administrator El Paso

2018 2018 Opinion Survey 1 ., Project Management a Sarasota County Communications

Tests for Multivariate Linear Models with the car Package John Fox McMaster University Hamilton,

12/30/2012 Hypotheses Prepared by: Amanda J. Rockinson-Szapkiw Liberty University Hypothesis:

Speaker line-up calibration of the i-vector based speaker recognition system for forensic

A Stochastic Model for the Background Spectrum of Tropical Convection Scott Hottovy and Samuel N.