Generalized Linear Models (GLIMs) Probabilistic Graphical Models - PowerPoint PPT Presentation

Exponential family & Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani

Outline  Exponential family  Many standard distributions are in this family  Similarities among learning algorithms for different models in this family:  ML estimation has a simple form for exponential families  moment matching of sufficient statistics  Bayesian learning is simplest for exponential families  GLIMs as to parameterize conditional distributions that have an exponential distribution on a variable for each value of parent 2

Exponential family: canonical parameterization 1 𝑎 𝜽 ℎ 𝒚 exp 𝜽 𝑈 𝑈(𝒚) 𝑄 𝒚 𝜽 = 𝑎 𝜽 = ℎ 𝒚 exp 𝜽 𝑈 𝑈(𝒚) 𝑒𝒚 𝑄 𝒚 𝜽 = ℎ 𝒚 exp 𝜽 𝑈 𝑈 𝒚 − ln 𝑎(𝜽) 𝐵(𝜽) : log partition function  𝑈: 𝒴 → ℝ 𝐿 : sufficient statistics function  𝜽 : natural or canonical parameters  ℎ: 𝒴 → ℝ + : reference measure independent of parameters  𝑎 : Normalization factor or partition function ( 0 < 𝑎 𝜽 < ∞ ) 3

Example: Bernouli 𝑄 𝑦 𝜄 = 𝜄 𝑦 1 − 𝜄 1−𝑦 𝜄 = exp ln 1 − 𝜄 𝑦 + ln 1 − 𝜄 𝜄 𝜃 = ln • 1−𝜄 𝑓 𝜃 𝜄 1 𝜃 = ln 1−𝜄 ⇒ 𝜄 = 𝑓 𝜃 +1 = • 1+𝑓 −𝜃 𝑈 𝑦 = 𝑦 • 𝐵 𝜃 = − ln 1 − 𝜄 = ln 1 + 𝑓 𝜃 • ℎ 𝑦 = 1 • 4

Example: Gaussian exp − 𝑦 − 𝜈 2 1 𝑄 𝑦 𝜈, 𝜏 2 = 2𝜏 2 2𝜌𝜏 𝜈 • 𝜽 = 𝜃 1 𝜏 2 𝜃 2 = 1 − 2𝜏 2 2𝜃 2 , 𝜏 2 = − 𝜃 1 1 ⇒ 𝜈 = − • 2𝜃 2 𝑦 • 𝑈 𝑦 = 𝑦 2 2 𝜈 2 = − 1 2 ln 2𝜌 − 1 2 ln −2𝜃 2 − 𝜃 1 • 𝐵 𝜽 = − ln 2𝜌𝜏 exp 2𝜏 2 4𝜃 2 • ℎ 𝑦 = 1 5

Example: Multinomial 𝐿 𝐿 𝑦 𝑙 𝑄 𝒚 𝜾 = 𝜄 𝑙 𝜄 𝑙 = 1 𝑙=1 𝑙=1 𝐿 𝑄 𝒚 𝜾 = exp 𝑦 𝑙 ln 𝜄 𝑙 𝑙=1 𝐿−1 𝐿−1 𝐿−1 = exp 𝑦 𝑙 ln 𝜄 𝑙 + 1 − 𝑦 𝑙 ln 1 − 𝜄 𝑙 𝑙=1 𝑙=1 𝑙=1 𝑈 𝜽 = 𝜃 1 , … , 𝜃 𝐿−1 𝑈 = ln 𝜄 1 𝜄 𝐿−1 𝐿−1 𝜄 𝑙 , … , ln • 𝐿−1 𝜄 𝑙 1− 𝑙=1 1− 𝑙=1 𝑈 𝑓 𝜃𝑙 𝜄 1 𝜄 𝐿−1 𝜽 = ln 𝜄 𝐿 , … , ln ⇒ 𝜄 𝑙 = • 𝑓 𝜃𝑘 𝐿 𝜄 𝐿 𝑘=1 𝑈 𝒚 = 𝑦 1 , … , 𝑦 𝐿−1 𝑈 • 𝐿−1 𝜄 𝑙 𝐿 𝑓 𝜃 𝑘 𝐵 𝜽 = − ln 𝜄 𝐿 = − ln 1 − 𝑙=1 = ln 𝑙=1 • 6

Well-behaved parameter space  Multiple exponential families may encode the same set of distributions  We want the parameter space 𝜽 0 < 𝑎 𝜽 < ∞ to be:  Convex set  Non-redundant : 𝜽 ≠ 𝜽 ′ ⇒ 𝑄 𝒚 𝜽 ≠ 𝑄 𝒚 𝜽 ′  The function from 𝜾 to 𝜽 is invertible  Example: invertible function from 𝜄 to 𝜃 in the Bernoulli example 𝜄 1 = 1+𝑓 −𝜃 7

Examples of non-exponential distributions  Uniform  Laplace  Student t-distribution 8

Moments 𝐵 𝜽 = ln 𝑎 𝜽 𝑎 𝜽 = ℎ 𝒚 exp 𝜽 𝑈 𝑈(𝒚) 𝑒𝒚 ℎ 𝒚 𝑈(𝒚) exp 𝜽 𝑈 𝑈(𝒚) 𝑒𝒚 𝛼 𝜽 𝑎 𝜽 𝛼 𝜽 𝐵 𝜽 = = 𝑎 𝜽 𝑎 𝜽 = 𝑈(𝒚) ℎ 𝒚 exp 𝜽 𝑈 𝑈(𝒚) 𝑒𝒚 = 𝐹 𝑄(𝒚|𝜽) 𝑈(𝒚) 𝑎 𝜽 The first derivative of 𝐵 𝜽 is ⇒ 𝛼 𝜽 𝐵 𝜽 = 𝐹 𝜽 𝑈(𝒚) the mean of sufficient statistics 2 𝐵 𝜽 = 𝐹 𝜽 𝑈 𝒚 𝑈 𝒚 𝑈 − 𝐹 𝜽 𝑈 𝒚 𝐹 𝜽 𝑈 𝒚 𝑈 = 𝐷𝑝𝑤 𝜽 𝑈 𝒚 𝛼 𝜽 The i- th derivative gives the i- th centered moment of sufficient statistics. 9

Properties  The moment parameters 𝜾 can be derived as a function of the natural or canonical parameters: 𝛼 𝜽 𝐵 𝜽 = 𝐹 𝜽 𝑈(𝒚) For many distributions, 𝜾 ≡ 𝐹 𝜽 𝑈(𝒚) ⇒ 𝛼 𝜽 𝐵 𝜽 = 𝜾 we have 𝜾 ≡ 𝐹 𝜽 𝑈(𝑦)  𝐵(𝜽) is convex since 𝛼 2 𝐵 𝜽 = 𝐷𝑝𝑤 𝜽 𝑈 𝒚 ≽ 0 𝜽  Covariance matrix is always positive semi-definite ⇒ Hessian 𝛼 2 𝐵 𝜽 is 𝜽 positive semi-definite, and hence that 𝐵 𝜽 = ln 𝑎 𝜽 is a convex function of 𝜽 . 10

Exponential family: moment parameterization  A distribution in the exponential family can also be parameterized by the moment parameterization : 1 𝑎 𝜾 ℎ 𝒚 exp 𝜔 𝜾 𝑈 𝑈(𝒚) 𝑄 𝒚 𝜾 = 𝜽 = 𝜔(𝜾) 𝑎 𝜾 = ℎ 𝒚 exp 𝜔 𝜾 𝑈 𝑈(𝒚) 𝑒𝒚 𝜔 maps the parameters 𝜾 to the space of sufficient statistics 𝜾 ≡ 𝐹 𝜽 𝑈(𝒚) = 𝛼 𝜽 𝐵 𝜽 𝜾 = 𝜔 −1 𝜽 is ascending ⟹ 𝜔 −1 𝜽 = 𝜾 = 𝛼 is  If 𝛼 2 𝐵 𝜽 ≻ 0 ⇒ 𝛼 𝜽 𝐵 𝜽 𝜽 𝐵 𝜽 𝜽 ascending and thus is 1-to-1  The mapping from the moments to the canonical parameters is invertible (1-to-1 relationship): 𝜽 = 𝜔(𝜾) 11

Sufficiency  A statistic is a function of a random variable  Suppose that the distribution of 𝑌 depends on a parameter 𝜄  “ 𝑈(𝑌) is a sufficient statistic for 𝜄 if there is no information in 𝑌 regarding 𝜄 beyond that in 𝑈(𝑌) ”  Sufficiency in both frequentist and Bayesian frameworks implies a factorization of 𝑄 𝑦 𝜄 (Neyman factorization theorem): 𝑄 𝑦, 𝑈 𝑦 , 𝜄 = 𝑔 𝑈 𝑦 , 𝜄 𝑕 𝑦, 𝑈 𝑦 𝑄 𝑦, 𝜄 = 𝑔 𝑈 𝑦 , 𝜄 𝑕(𝑦, 𝑈(𝑦)) 𝑄 𝑦|𝜄 = 𝑔′ 𝑈 𝑦 , 𝜄 𝑕(𝑦, 𝑈(𝑦)) 12

Sufficient statistic  Sufficient statistic and the exponential family: 𝑄 𝒚 𝜽 = ℎ 𝒚 exp 𝜽 𝑈 𝑈 𝒚 − 𝐵(𝜽)  Sufficient statistic in the case of i.i.d sampling can be obtained easily for a set of N observations from a distribution 𝑂 ℎ 𝒚 (𝑜) exp 𝜽 𝑈 𝑈 𝒚 𝑜 𝑄 𝒠 𝜽 = − 𝐵(𝜽) 𝑜=1 𝑂 𝑂 ℎ 𝒚 (𝑜) exp{𝜽 𝑈 𝑈 𝒚 𝑜 = − 𝑂𝐵 𝜽 } 𝑜=1 𝑜=1 𝒠 has itself an exponential distribution with sufficient statistic 𝑂 𝑈 𝒚 𝑜 𝑜=1 13

MLE for exponential family 𝑂 ℎ 𝒚 (𝑜) exp 𝜽 𝑈 𝑈 𝒚 𝑜 ℓ 𝜽; 𝒠 = ln 𝑄 𝒠 𝜽 = ln − 𝐵(𝜽) 𝑜=1 𝑂 𝑂 Concave ℎ(𝒚 (𝑜) ) + 𝜽 𝑈 𝑈 𝒚 𝑜 = ln − 𝑂𝐵 𝜽 function 𝑜=1 𝑜=1 𝑂 𝑈 𝒚 𝑜 𝛼 𝜽 ℓ 𝜽; 𝒠 = 0 ⇒ − 𝑂𝛼 𝜽 𝐵 𝜽 = 0 𝑜=1 𝑂 𝑈 𝒚 𝑜 𝜽 = 𝑜=1 ⇒ 𝛼 𝜽 𝐵 𝑂 𝑂 𝑈 𝒚 𝑜 𝜽 𝑈(𝒚) = 𝑜=1 𝜽 𝐵 ⇒ 𝛼 𝜽 = 𝐹 𝑂 moment matching 15

Exponential family: summary  Many famous distribution are in the exponential family  Important properties for learning with exponential families:  Gradients of log partition function gives expected sufficient statistics, or moments , for some models  Moments of any distribution in exponential family can be easily computed by taking the derivatives of the log normalizer  The Hessian of the log partition function is positive semi-definite and so the log partition function is convex  Are important for modeling distributions of Markov networks 16

Generalized linear models (GLIMs)  Conditional relationship between 𝑍 and 𝒀  Examples:  Linear regression: 𝑄 𝑧 𝒚, 𝒙, 𝜏 2 = 𝒪(𝑧|𝒙 𝑈 𝒚, 𝜏 2 )  Discriminative linear classifier (two class)  Logistic regression: 𝑄 𝑧 𝒚, 𝒙 = 𝐶𝑓𝑠(𝑧|𝜏 𝒙 𝑈 𝒚 )  Probit regression: 𝑄 𝑧 𝒚, 𝒙 = 𝐶𝑓𝑠(𝑧|Φ 𝒙 𝑈 𝒚 ) where Φ is the cdf of 𝒪(0,1) 17

Generalized linear models (GLIMs)  𝑄(𝑧|𝒚) is a generalized linear model if:  𝒚 enters into the model via a linear combination 𝜾 𝑈 𝒚  The conditional mean of 𝑄(𝑧|𝒚) is expressed as 𝑔 𝜾 𝑈 𝒚 :  𝑔 is called the response function  𝜈 = 𝐹 𝑧|𝒚 = 𝑔 𝜾 𝑈 𝒚  The distribution of 𝑧 is characterized by an exponential family distribution (with conditional mean 𝑔 𝜾 𝑈 𝒚 )  We have two choices in the specification of a GLIM:  The choice of the exponential family distribution  Usually constrained by the nature of 𝑍  The choice of the response function 𝑔  the principal degree of freedom in the specification of a GLIM  However, we need to impose constraints on this function (e.g., 𝑔 must be in [0,1] for Bernoulli distribution on 𝑧 ) 18

The relation between vars. in a GLIMs 19

Canonical response function  Canonical response function: 𝑔(. ) = 𝜔 −1 (. ) or 𝜊 = 𝜃  In this case, the choice of the exponential family density completely determines the GLIM  The constraints on the range of 𝑔 are automatically satisfied. are guaranteed to be possible values of the conditional  𝜈 = 𝑔 𝜃 expectation (i.e., 𝑔 𝜃 = 𝜔 −1 𝜃 = 𝑒𝐵 𝜃 = 𝐹 𝑍|𝜃 ) 𝑒𝜃 20

Log likelihood for GLIMs ℓ 𝜽; 𝒠 = ln 𝑄 𝒠 𝜽 𝑂 ℎ 𝑧 (𝑜) exp 𝜃 (𝑜) 𝑧 (𝑜) − 𝐵 𝜃 (𝑜) = ln 𝑜=1 𝑂 𝑂 ln ℎ 𝑧 (𝑜) + 𝜃 (𝑜) 𝑧 (𝑜) − 𝐵 𝜃 (𝑜) = 𝑜=1 𝑜=1  𝜃 (𝑜) = 𝜔(𝜈 𝑜 ) and 𝜈 𝑜 = 𝑔 𝜾 𝑈 𝒚 (𝑜)  In the case of canonical response function 𝜃 (𝑜) = 𝜾 𝑈 𝒚 (𝑜) 𝑂 𝑂 𝑂 ln ℎ 𝑧 (𝑜) + 𝜾 𝑈 𝒚 (𝑜) 𝑧 (𝑜) − 𝐵 𝜾 𝑈 𝒚 (𝑜) ℓ 𝜾; 𝒠 = 𝑜=1 𝑜=1 𝑜=1 Sufficient statistics for 𝜾 21

Generalized Linear Models (GLIMs) Probabilistic Graphical Models - PowerPoint PPT Presentation

Exponential family & Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Outline Exponential family Many standard distributions are in this family Similarities

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif University of Technology

Exponential family & Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Generalized Linear Models (GLMs/GLIMs) STAT 757 Tuesday, April 19, 2016 Model Framework The GLM

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Point sets, Maps and Navigation - III D.A. Forsyth Localization We can now robustly register

Point sets, Maps and Navigation - II D.A. Forsyth Robustness is a serious problem Robustness is

Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Inverse problems with L 1 data fitting Christian Clason, Bangti JIN, Karl Kunisch Institute for

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs.

Reference Classes Lee Edlefsen, Ph.D. Chief Scientist Sue Ranney, Ph.D. Chief Data Scientist

Fast algorithms for nonconvex compressive sensing Rick Chartrand Los Alamos National Laboratory

Sambuz

Useful Links

Newsletter

Mail Us

Generalized Linear Models (GLIMs) Probabilistic Graphical Models - PowerPoint PPT Presentation

Exponential family & Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif University of Technology Spring 2018 Soleymani Outline Exponential family Many standard distributions are in this family Similarities

Overview of logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models

Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif University of Technology

Exponential family &amp; Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif

Limitations of linear models Richard Erickson Instructor DataCamp Generalized Linear Models in

Generalized Linear Models (GLMs/GLIMs) STAT 757 Tuesday, April 19, 2016 Model Framework The GLM

Introduction to the R Statistical Computing Environment Linear and Generalized Linear Models in R

Generalized linear models Christopher F Baum EC 823: Applied Econometrics Boston College, Spring

Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik

Introduction to General and Generalized Linear Models Generalized Linear Models - part I Henrik

Multiple logistic regression Richard Erickson Instructor DataCamp Generalized Linear Models in

Workshop 11.2a: Generalized Linear Mixed Effects Models (GLMM) Murray Logan February 7, 2017

Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik

Generalized Nonlinear Models gnm : a Package for Generalized Nonlinear Models Same form as

Generalized Additive Models September 10, 2019 Generalized Additive Models September 10, 2019 1

CS 7616 Pattern Recognition Linear, Linear, Linear Aaron Bobick School of Interactive

Introduction to Data Science: Logistic 0 1 1 according to a data fit criterion. account

Point sets, Maps and Navigation - III D.A. Forsyth Localization We can now robustly register

Point sets, Maps and Navigation - II D.A. Forsyth Robustness is a serious problem Robustness is

Motion Estimation (I) Ce Liu celiu@microsoft.com Microsoft Research New England We live in a

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Inverse problems with L 1 data fitting Christian Clason, Bangti JIN, Karl Kunisch Institute for

Lecture 9: Nave Bayes Classifier (contd.) Logistic Regression Discriminative vs.

Reference Classes Lee Edlefsen, Ph.D. Chief Scientist Sue Ranney, Ph.D. Chief Data Scientist

Fast algorithms for nonconvex compressive sensing Rick Chartrand Los Alamos National Laboratory

Sambuz

Useful Links

Newsletter

Mail Us

Exponential family & Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif