Exponential Family Distributions CMSC 691 UMBC Exponential Family - PowerPoint PPT Presentation

Exponential Family Distributions CMSC 691 UMBC

Exponential Family Form

Exponential Family Form Support function • Formally necessary, often irrelevant (e.g., Gaussian distributions), except when it isn’t (e.g., Dirichlet distributions)

Exponential Family Form Distribution Parameters • Natural parameters • Feature weights

Exponential Family Form Sufficient statistics • Feature function(s)

Exponential Family Form Log-normalizer

Exponential Family Form Log-normalizer ℎ 𝑦 ′ exp(𝜄 𝑈 𝑔(𝑦 ′ )) 𝐵 𝜄 = log ෍ Discrete x 𝑦′ 𝐵 𝜄 = log ∫ ℎ 𝑦 ′ exp(𝜄 𝑈 𝑔(𝑦 ′ ))𝑒𝑦 ′ Continuous x

Why Bother with This? • A common form for common distributions • “Easily” compute gradients of likelihood wrt parameters • “Easily” compute expectations, especially entropy and KL divergence • “Easy” posterior inference via conjugate distributions

Why? Capture Common Distributions Bernoulli/Binomial These can all be written in Categorical/Multinomial this “common” form (different h , f , and A Poisson functions) 𝑞 𝜄 𝑦 = Normal ℎ 𝑦 exp 𝜄 𝑈 𝑔 𝑦 − 𝐵(𝜄) Gamma … See a good stats book, or https://en.wikipedia.org/wiki/Exponential_family#Table_of_distributions

Why? Capture Common Distributions Discrete/Categorical (Finite distributions) “Traditional” Form Exponential Family Form 𝟐[𝑙=𝑘] 𝑞 𝜌 𝑌 = 𝑙 = ෑ 𝜌 𝑘 𝜄 = ??? 𝑘 𝑔 𝑦 = ??? 1 𝑑 = ቊ1, 𝑑 is true ℎ 𝑦 = ??? 0, 𝑑 is false

Why? Capture Common Distributions Discrete/Categorical (Finite distributions) How do we find this? 𝑏𝑐 = exp(log 𝑏 + log 𝑐) “Traditional” Form 𝟐[𝑙=𝑘] 𝑞 𝜌 𝑌 = 𝑙 = ෑ 𝜌 𝑘 𝑘 1 𝑑 = ቊ1, 𝑑 is true 0, 𝑑 is false

Why? Capture Common Distributions Discrete/Categorical (Finite distributions) How do we find this? 𝑏𝑐 = exp(log 𝑏 + log 𝑐) “Traditional” Form 𝟐[𝑙=𝑘] 𝑞 𝜌 𝑌 = 𝑙 = ෑ 𝜌 𝑘 𝟐[𝑙=𝑘] 𝑞 𝜌 𝑌 = 𝑙 = ෑ 𝜌 𝑘 𝑘 𝑘 1 𝑑 = ቊ1, 𝑑 is true = exp ෍ 1 𝑙 = 𝑘 ∗ log 𝜌 𝑘 𝑘 0, 𝑑 is false

Why? Capture Common Distributions Discrete/Categorical (Finite distributions) How do we find this? 𝑏𝑐 = exp(log 𝑏 + log 𝑐) “Traditional” Form 𝟐[𝑙=𝑘] 𝑞 𝜌 𝑌 = 𝑙 = ෑ 𝜌 𝑘 𝟐[𝑙=𝑘] 𝑞 𝜌 𝑌 = 𝑙 = ෑ 𝜌 𝑘 𝑘 𝑘 1 𝑑 = ቊ1, 𝑑 is true = exp ෍ 1 𝑙 = 𝑘 ∗ log 𝜌 𝑘 𝑘 0, 𝑑 is false 𝑈 1[𝑙 = 1] … = exp log 𝜌 1[𝑙 = 𝐿]

Why? Capture Common Distributions Discrete/Categorical (Finite distributions) “Traditional” Form Exponential Family Form 𝟐[𝑙=𝑘] 𝑞 𝜌 𝑌 = 𝑙 = ෑ 𝜌 𝑘 𝜄 = log 𝜌 1 , … , log 𝜌 𝐿 𝑘 𝑔 𝑦 = 1 𝑦 = 1 , … , 1[𝑦 = 𝐿] 1 𝑑 = ቊ1, 𝑑 is true ℎ 𝑦 = 1 0, 𝑑 is false

Why? Capture Common Distributions Gaussian “Traditional” Form Exponential Family Form ℎ 𝑦 = 1

Why? Capture Common Distributions Dirichlet Exponential Family Form “Traditional” Form If we assume ℎ 𝑦 = 1 𝑦 ∈ Δ 𝐿−1 If we explicitly ℎ 𝑦 = 1 ෍ 𝑦 𝑙 = 1 enforce 𝑦 ∈ Δ 𝐿−1 𝑙

Why? Capture Common Distributions Discrete (Finite distributions) Dirichlet (Distributions over (finite) distributions) Gaussian Gamma, Exponential, Poisson, Negative-Binomial, Laplace, log- Normal,…

Why? “Easy” Gradients Gradient of likelihood

Why? “Easy” Gradients Gradient of likelihood Expected sufficient Observed sufficient statistics (feature counts) statistics (feature counts)

Why? “Easy” Gradients Expected sufficient statistics Gradient of likelihood “Count” w.r.t. current model parameters Observed sufficient statistics “Count” w.r.t. empirical distribution

Why? “Easy” Expectations expectation of gradient of the the sufficient log normalizer statistics

Conjugate Distributions • Let 𝜄 ∼ 𝑞 , and let 𝑦|𝜄 ∼ 𝑟 • If p is the conjugate prior for q then the posterior distribution 𝑞(𝜄|𝑦) is of the same type/family as the prior 𝑞(𝜄)

Why? “Easy” Posterior Inference

Why? “Easy” Posterior Inference p is the conjugate prior for q

Why? “Easy” Posterior Inference Posterior p has same form as prior p p is the conjugate prior for q

Why? “Easy” Posterior Inference Posterior p has same form as prior p p is the conjugate prior for q All exponential family models have a conjugate prior (in theory)

Why? “Easy” Posterior Inference Posterior p has same form as prior p p is the conjugate prior for q Posterior Likelihood Prior Dirichlet (Beta) Discrete (Bernoulli) Dirichlet (Beta) Normal Normal (fixed var.) Normal Gamma Exponential Gamma …

Conjugate Prior Example • 𝑞 𝜄 = Dir(𝛽) , 𝑟 𝑦 𝑗 𝜄 = Cat(𝜄) i.i.d. • Let 𝑔(𝑦) be the Cat sufficient statistic function • 𝑞 𝜄 𝑦 = Dir(𝛽 + σ 𝑗 𝑔(𝑦 𝑗 )) 𝑞 𝜄 𝑦 1 , … , 𝑦 𝑂 ) ∝ 𝑟 𝑦 1 , … , 𝑦 𝑂 𝜄) 𝑞(𝜄)

Conjugate Prior Example • 𝑞 𝜄 = Dir(𝛽) , 𝑟 𝑦 𝑗 𝜄 = Cat(𝜄) i.i.d. • Let 𝑔(𝑦) be the Cat sufficient statistic function • 𝑞 𝜄 𝑦 = Dir(𝛽 + σ 𝑗 𝑔(𝑦 𝑗 )) 𝑞 𝜄 𝑦 1 , … , 𝑦 𝑂 ) ∝ 𝑟 𝑦 1 , … , 𝑦 𝑂 𝜄) 𝑞(𝜄) exp log 𝜄 𝑈 𝑔 𝑦 𝑗 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄) = ෑ exp 𝑗 Rewrite q and p with exponential family forms

Conjugate Prior Example • 𝑞 𝜄 = Dir(𝛽) , 𝑟 𝑦 𝑗 𝜄 = Cat(𝜄) i.i.d. • Let 𝑔(𝑦) be the Cat sufficient statistic function • 𝑞 𝜄 𝑦 = Dir(𝛽 + σ 𝑗 𝑔(𝑦 𝑗 )) 𝑞 𝜄 𝑦 1 , … , 𝑦 𝑂 ) ∝ 𝑟 𝑦 1 , … , 𝑦 𝑂 𝜄) 𝑞(𝜄) exp log 𝜄 𝑈 𝑔 𝑦 𝑗 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄) = ෑ exp 𝑗 log 𝜄 𝑈 𝑔 𝑦 𝑗 + ( 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄)) = exp ෍ 𝑗 Replace with specific natural parameters and sufficient statistic functions

Conjugate Prior Example • 𝑞 𝜄 = Dir(𝛽) , 𝑟 𝑦 𝑗 𝜄 = Cat(𝜄) i.i.d. • Let 𝑔(𝑦) be the Cat sufficient statistic function • 𝑞 𝜄 𝑦 = Dir(𝛽 + σ 𝑗 𝑔(𝑦 𝑗 )) 𝑞 𝜄 𝑦 1 , … , 𝑦 𝑂 ) ∝ 𝑟 𝑦 1 , … , 𝑦 𝑂 𝜄) 𝑞(𝜄) exp log 𝜄 𝑈 𝑔 𝑦 𝑗 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄) = ෑ exp 𝑗 log 𝜄 𝑈 𝑔 𝑦 𝑗 + ( 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄)) = exp ෍ 𝑗 Notice common terms that can be simplified together

Conjugate Prior Example • 𝑞 𝜄 = Dir(𝛽) , 𝑟 𝑦 𝑗 𝜄 = Cat(𝜄) i.i.d. • Let 𝑔(𝑦) be the Cat sufficient statistic function • 𝑞 𝜄 𝑦 = Dir(𝛽 + σ 𝑗 𝑔(𝑦 𝑗 )) 𝑞 𝜄 𝑦 1 , … , 𝑦 𝑂 ) ∝ 𝑟 𝑦 1 , … , 𝑦 𝑂 𝜄) 𝑞(𝜄) exp log 𝜄 𝑈 𝑔 𝑦 𝑗 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄) = ෑ exp 𝑗 log 𝜄 𝑈 𝑔 𝑦 𝑗 + ( 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄)) = exp ෍ 𝑗 𝑈 Group common terms = exp 𝛽 − 1 + ෍ 𝑔 𝑦 𝑗 log 𝜄 − 𝐵(𝜄) 𝑗

Conjugate Prior Example • 𝑞 𝜄 = Dir(𝛽) , 𝑟 𝑦 𝑗 𝜄 = Cat(𝜄) i.i.d. • Let 𝑔(𝑦) be the Cat sufficient statistic function • 𝑞 𝜄 𝑦 = Dir(𝛽 + σ 𝑗 𝑔(𝑦 𝑗 )) 𝑞 𝜄 𝑦 1 , … , 𝑦 𝑂 ) ∝ 𝑟 𝑦 1 , … , 𝑦 𝑂 𝜄) 𝑞(𝜄) exp log 𝜄 𝑈 𝑔 𝑦 𝑗 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄) = ෑ exp 𝑗 log 𝜄 𝑈 𝑔 𝑦 𝑗 + ( 𝛽 − 1 𝑈 log 𝜄 − 𝐵(𝜄)) = exp ෍ 𝑗 T Group common Notice: this is the = exp 𝛽 − 1 + ෍ 𝑔 𝑦 𝑗 log 𝜄 − 𝐵(𝜄) terms form of a Dirichlet 𝑗

Exponential Family Distributions CMSC 691 UMBC Exponential Family - PowerPoint PPT Presentation

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form Support function Formally necessary, often irrelevant (e.g., Gaussian distributions), except when it isnt (e.g., Dirichlet

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

Beyond the exponential family Eric Pedersen, Gavin Simpson, David Miller August 6th, 2016 Away

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

About Revit Family (NAH) Project Family Management Annotation Family System Family

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

Graphical Models Graphical Models Exponential family & Variational Inference I Siamak

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Exponential family & Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

GSoC 2016: Exponential Integrators Chiara Segala Mentor: Prof. Marco Caliari GSoC 2016:

Solving exponential and logarithmic equations We explore some results involving exponential

Exponential distribution STAT 587 (Engineering) Iowa State University September 17, 2020

Exponential & Normal Distribution Lec.22 July 29, 2020 Exponential Distribution: Fundamental

Exponential and Logarithm Natural Logs and e Exponential Growth and Decay Functions Slide 3 /

Input Distributions Reading: Chapter 6 in Law Peter J. Haas CS 590M: Simulation Spring Semester

MLE/MAP Matt Gormley Lecture 20 Oct 29, 2018 1 Q&A 9 PROBABILISTIC LEARNING 11

Recipes and Economic Growth: A Combinatorial March Down an Exponential Tail Chad Jones Stanford

The story of the film so far... C.r.v.s X and Y have a joint density f ( x , y ) with Mathematics

T owards parallelizing the Gillespie SSA Srivastav Ranganathan and Aparna JS Indian Institute

Sampling Methods Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia

Outline Continuous Probability Distributions The Uniform Distribution (4.1) ( ) The

Exponential functionals of conditioned Lvy processes and local time of a diffusion in a Lvy

Exponential Family Distributions CMSC 691 UMBC Exponential Family - PowerPoint PPT Presentation

Exponential Family Distributions CMSC 691 UMBC Exponential Family Form Exponential Family Form Support function Formally necessary, often irrelevant (e.g., Gaussian distributions), except when it isnt (e.g., Dirichlet

Exponential Families Leila Wehbe March 19, 2013 Leila Wehbe Exponential Families Exponential

Beyond the exponential family Eric Pedersen, Gavin Simpson, David Miller August 6th, 2016 Away

Formal Modeling in Cognitive Science 1 Distributions Lecture 20: Joint, Marginal, and Conditional

About Revit Family (NAH) Project Family Management Annotation Family System Family

Exponential Growth Exponential Growth Introduction Exponential Growth vs. Linear Growth

Applications of exponential functions Applications of exponential functions abound throughout the

Graphical Models Graphical Models Exponential family &amp; Variational Inference I Siamak

? ? ? ? Basic Charts Outline - Distributions &amp; Histograms - Mean, Mode, Average - Chart

Exponential family &amp; Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif

Exponential smoothing and non-negative data Muhammad Akram Rob J Hyndman J Keith Ord Business

Section5.2 Exponential Functions and Graphs Graphing Definition The exponential function with

GSoC 2016: Exponential Integrators Chiara Segala Mentor: Prof. Marco Caliari GSoC 2016:

Solving exponential and logarithmic equations We explore some results involving exponential

Exponential distribution STAT 587 (Engineering) Iowa State University September 17, 2020

Exponential &amp; Normal Distribution Lec.22 July 29, 2020 Exponential Distribution: Fundamental

Exponential and Logarithm Natural Logs and e Exponential Growth and Decay Functions Slide 3 /

Input Distributions Reading: Chapter 6 in Law Peter J. Haas CS 590M: Simulation Spring Semester

MLE/MAP Matt Gormley Lecture 20 Oct 29, 2018 1 Q&amp;A 9 PROBABILISTIC LEARNING 11

Recipes and Economic Growth: A Combinatorial March Down an Exponential Tail Chad Jones Stanford

The story of the film so far... C.r.v.s X and Y have a joint density f ( x , y ) with Mathematics

T owards parallelizing the Gillespie SSA Srivastav Ranganathan and Aparna JS Indian Institute

Sampling Methods Henrik I. Christensen Robotics &amp; Intelligent Machines @ GT Georgia

Outline Continuous Probability Distributions The Uniform Distribution (4.1) ( ) The

Exponential functionals of conditioned Lvy processes and local time of a diffusion in a Lvy

Graphical Models Graphical Models Exponential family & Variational Inference I Siamak

? ? ? ? Basic Charts Outline - Distributions & Histograms - Mean, Mode, Average - Chart

Exponential family & Generalized Linear Models (GLIMs) Probabilistic Graphical Models Sharif

Exponential & Normal Distribution Lec.22 July 29, 2020 Exponential Distribution: Fundamental

MLE/MAP Matt Gormley Lecture 20 Oct 29, 2018 1 Q&A 9 PROBABILISTIC LEARNING 11

Sampling Methods Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia