& Exact inference for Gaussian networks Probabilistic Graphical - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2016 Soleymani

Multivariate Gaussian distribution 2𝜌 𝑒/2 𝜯 1/2 exp{− 1 1 2 𝒚 − 𝝂 𝑈 𝜯 −1 (𝒚 − 𝝂)} 𝒪 𝒚|𝝂, 𝜯 =  The natural, canonical, or information parameterization of a Gaussian distribution arises from quadratic form 𝒪 𝒚| 𝒊 , 𝑲 ∝ exp{− 1 2 𝒚 𝑈 𝑲𝒚 + 𝒊 𝑈 𝒚} 𝚳 = 𝑲 = 𝜯 −1 𝒊 = 𝜯 −1 𝝂 2

Joint Gaussian distribution: block elements  If we partition the vector 𝒚 into 𝒚 1 and 𝒚 2 : 𝝂 = 𝝂 1 𝝂 2 𝑈 𝜯 = 𝜯 11 𝜯 12 𝜯 21 = 𝜯 12 𝜯 11 and 𝜯 22 are symmetric 𝜯 21 𝜯 22 𝒚 1 𝒚 1 𝝂 1 𝝂 2 , 𝜯 11 𝜯 12 𝑄 𝝂, 𝜯 = 𝒪 𝒚 2 𝒚 2 𝜯 21 𝜯 22 3

Marginal and conditional of Gaussian 𝑦 2 𝑦 2 = 0.7 𝑄(𝑦 1 |𝑦 2 = 0.7) 𝑄(𝑦 1 , 𝑦 2 ) 𝑄(𝑦 1 ) 𝑦 1 [Bishop] For multivariate Gaussian distribution, all marginal and conditional distributions are also Gaussians 4

Matrix inverse lemma −1 −𝑁𝐶𝐸 −1 𝐵 𝐶 𝑁 = 𝐸 −1 + 𝐸 −1 𝐷𝑁𝐶𝐸 −1 −𝐸 −1 𝐷𝑁 𝐷 𝐸 𝑁 = 𝐵 − 𝐶𝐸 −1 𝐷 −1 5

Precision matrix  In many situations, it will be convenient to work with 𝚳 = 𝜯 −1 known as precision matrix: −1 𝚳 = 𝜯 −1 = 𝜯 11 𝜯 12 𝑈 𝜯 21 𝜯 22 𝚳 21 = 𝚳 12 𝚳 = 𝚳 11 𝚳 12 𝚳 11 and 𝚳 22 are symmetric 𝚳 21 𝚳 22 Relation between the inverse of a partitioned matrix and the inverses of its partitions (using matrix inverse lemma): −1 𝜯 21 −1 𝚳 11 = 𝜯 11 − 𝜯 12 𝜯 22 −1 𝜯 21 −1 𝜯 12 𝜯 22 −1 𝚳 12 = − 𝜯 11 − 𝜯 12 𝜯 22 6

Marginal and conditional distributions based on block elements of 𝜯  Conditional distributions : 𝑄 𝒚 1 |𝒚 2 = 𝒪 𝒚 1 |𝝂 1|2 , 𝜯 1|2 −1 𝒚 2 − 𝝂 2 𝝂 1|2 = 𝝂 1 + 𝜯 12 𝜯 22 −1 𝜯 21 𝜯 1|2 = 𝜯 11 − 𝜯 12 𝜯 22  Marginal distributions based on block element of 𝝂 and 𝜯 : 𝑄 𝒚 1 = 𝒪 𝒚 1 |𝝂 1 , 𝜯 11 𝑄 𝒚 2 = 𝒪 𝒚 2 |𝝂 2 , 𝜯 22 8

Factor analysis  Gaussian latent variable 𝒂 ( 𝑀 dimensional)  Continuous latent variable  Observed variable 𝒀 ( 𝐸 dimensional) 𝒂 𝑄 𝒜 = 𝒪 𝒚 𝟏, 𝑱 𝝂, 𝑩, 𝛀 𝒀 𝑄 𝒚|𝒜 = 𝒪 𝒚 𝝂 + 𝑩𝒜, 𝛀 𝒂 ∈ ℝ 𝑀 𝑩 : factor loading 𝐸 × 𝑀 matrix 𝒀 ∈ ℝ 𝐸 𝛀 : diagonal covariance matrix 9

Marginal distribution 𝒜 𝑄 𝒜 = 𝒪 𝒜 𝟏, 𝑱 𝒚 = 𝝂 + 𝑩𝒜 + 𝒙 𝑄 𝒚|𝒜 = 𝒪 𝒚 𝝂 + 𝑩𝒜, 𝛀 𝒙~𝒪(𝟏, 𝛀) 𝒙 is independent of 𝒚 and 𝒜 𝒚 The product of Gaussian distributions are Gaussian, as well as the marginal of Gaussian, thus 𝑄 𝒚 = 𝑄 𝒜 𝑄 𝒚|𝒜 𝑒𝒜 is Gaussian 𝝂 𝒚 = 𝐹 𝒚 = 𝐹 𝝂 + 𝑩𝒜 = 𝝂 + 𝑩𝐹 𝒜 = 𝝂 𝒚 − 𝝂 𝑈 = 𝐹 𝑩𝒜 + 𝒙 𝑈 𝜯 𝒚𝒚 = 𝐹 𝒚 − 𝝂 𝑩𝒜 + 𝒙 = 𝑩𝐹 𝒜𝒜 𝑈 𝑩 𝑈 + 𝛀 = 𝑩𝑩 𝑈 + 𝛀 ⇒ 𝑄 𝒚 = 𝒪 𝒚|𝝂 , 𝑩𝑩 𝑈 + 𝛀 10

Joint Gaussian distribution 𝜯 𝒜𝒚 = 𝐷𝑝𝑤 𝒜, 𝒚 = 𝐹 𝒜 𝑩𝒜 + 𝒙 𝑈 = 𝑩 𝑈 𝜯 𝒚𝒚 = 𝑩𝑩 𝑈 + 𝜴 𝜯 𝒜𝒜 = 𝑱 𝑱 𝑩 ⇒ 𝜯 = 𝑩𝑩 𝑈 + 𝜴 𝑩 𝑈 𝐹 𝒜 𝒚 = 𝟏 𝝂 𝒜 𝒜 𝟏 𝑱 𝑩 ⇒ 𝑄 = 𝒪 𝝂 , 𝑩𝑩 𝑈 + 𝜴 𝑩 𝑈 𝒚 𝒚 11

𝑄 𝒚 1 |𝒚 2 = 𝒪 𝒚 1 |𝝂 1|2 , 𝜯 1|2 −1 𝒚 2 − 𝝂 2 𝝂 1|2 = 𝝂 1 + 𝜯 12 𝜯 22 Conditional distributions −1 𝜯 21 𝜯 1|2 = 𝜯 11 − 𝜯 12 𝜯 22 𝑄 𝒜|𝒚 = 𝒪 𝒜|𝝂 𝒜|𝒚 , 𝚻 𝒜|𝒚 𝝂 𝒜|𝒚 = 𝑩 𝑈 𝑩𝑩 𝑈 + 𝛀 −1 𝒚 − 𝝂 𝚻 𝒜|𝒚 = 𝑱 − 𝑩 𝑈 𝑩𝑩 𝑈 + 𝛀 −1 𝑩  𝐸 × 𝐸 matrix is required to be inverted. If 𝑀 < 𝐸 , it is preferred to use: 𝝂 𝒜|𝒚 = 𝑱 + 𝑩 𝑈 𝛀 −1 𝑩 −1 𝑩 𝑈 𝛀 −1 𝒚 − 𝝂 = 𝚻 𝒜|𝒚 𝑩 𝑈 𝛀 −1 𝒚 − 𝝂 𝚻 𝒜|𝒚 = 𝑱 + 𝑩 𝑈 𝛀 −1 𝑩 −1 Posterior covariance does not depend on observed data 𝒚 ! Computing the posterior mean is a linear operation 12 𝐵 − 𝐶𝐸 −1 𝐷 −1 = 𝐵 −1 + 𝐵 −1 𝐶 𝐸 − 𝐷𝐵 −1 𝐶 −1 𝐷𝐵 −1

Geometric illustration 𝒚 𝑦 3 𝑄(𝒜|𝒚) [Jordan] 𝑄(𝒜) 𝑦 2 𝑦 1 To generate data, first generate a point within the manifold then add noise. 13

Factor analysis: dimensionality reduction  FA is just a constrained Gaussian model  If 𝜴 were not diagonal then we could model any Gaussian  FA is a low rank parameterization of a multi-variate Gaussian  Since 𝑄 𝒚 = 𝒪 𝒚|𝝂 , 𝑩𝑩 𝑈 + 𝛀 , FA approximates the covariance matrix of the visible vector using a low-rank decomposition 𝑩𝑩 𝑈 and the diagonal matrix 𝛀  𝑩𝑩 𝑈 + 𝛀 is the outer product of two low-rank matrices plus a diagonal matrix (i.e., 𝑃(𝑀𝐸) parameters instead of 𝑃(𝐸 2 ) )  Given {𝒚 1 , … , 𝒚 (𝑂) } (the observation on high dimensional data), by learning from incomplete data we find 𝑩 for transforming data to a lower dimensional space 14

Incomplete likelihood ℓ 𝜾; 𝒠 𝑂 = − 𝑂 2 log 𝑩𝑩 𝑈 + 𝜴 − 1 𝑈 𝑩𝑩 𝑈 + 𝜴 −1 𝒚 𝑜 − 𝝂 𝒚 𝑜 − 𝝂 2 𝑜=1 = − 𝑂 2 log 𝑩𝑩 𝑈 + 𝜴 − 1 2 𝑢𝑠 𝑩𝑩 𝑈 + 𝜴 −1 𝑻 𝑂 𝑈 𝒚 𝑜 − 𝝂 𝒚 𝑜 − 𝝂 𝑻 = 𝑜=1 𝑂 𝝂 𝑁𝑀 = 1 𝒚 𝑜 𝑂 𝑜=1 15

E-step: expected sufficient statistics 𝑂 𝐹 𝑄 𝒜 (𝑜) |𝒚 (𝑜) ,𝜾 𝑢 log 𝑄 𝒜 (𝑜) 𝜾 + log 𝑄 𝒚 (𝑜) 𝒜 (𝑜) , 𝜾 𝐹 𝑄 ℋ|𝒠,𝜾 𝑢 log 𝑄 𝒠, ℋ 𝜾 = 𝑜=1  Expected sufficient statistics: 𝑂 = − 𝑂 2 log 𝜴 − 1 𝑢𝑠 𝐹 𝒜 𝑜 𝒜 𝑜 𝑈 𝐹 log 𝑄 𝒠, ℋ 𝜾 2 𝑜=1 𝑂 − 1 𝑈 𝜴 −1 + 𝑑 𝒚 𝑜 − 𝑩𝒜 𝑜 𝒚 𝑜 − 𝑩𝒜 𝑜 2 𝑢𝑠 𝐹 𝑜=1 𝒚 𝑜 − 𝑩𝒜 (𝑜) 𝑈 𝒚 𝑜 − 𝑩𝒜 (𝑜) 𝐹 = 𝒚 𝑜 𝒚 𝑜 𝑈 − 𝑩𝐹 𝒜 𝑜 𝒚 𝑜 − 𝒚 𝑜 𝐹 𝒜 𝑜 𝑈 𝑩 𝑈 + 𝑩𝐹 𝒜 (𝑜) 𝒜 𝑜 𝑈 𝑩 𝑈 = 𝝂 𝒜|𝒚 (𝑜) = 𝚻 𝒜|𝒚 𝑩 𝑈 𝜴 −1 𝒚 𝑜 − 𝝂 𝐹 𝑄 𝒜 (𝑜) |𝒚 (𝑜) ,𝜾 𝑢 𝒜 𝑜 𝒜 (𝑜) 𝒜 𝑜 𝑈 = 𝚻 𝒜|𝒚 (𝑜) + 𝝂 𝒜|𝒚 (𝑜) 𝝂 𝒜|𝒚 𝑜 𝑈 𝐹 𝑄 𝒜 (𝑜) |𝒚 (𝑜) ,𝜾 𝑢 𝚻 𝒜|𝒚 = 𝑱 + 𝑩 𝑈 𝛀 −1 𝑩 −1 16 𝝂 𝒜|𝒚 = 𝚻 𝒜|𝒚 𝑩 𝑈 𝛀 −1 𝒚 − 𝝂

M-Step −1 𝑂 𝑂 𝑈 𝐹 𝒜 𝑜 𝒜 𝑜 𝑈 𝑩 𝑢+1 = 𝒚 𝑜 𝐹 𝒜 𝑜 𝑜=1 𝑜=1 𝜴 𝑢+1 = 1 𝒚 𝑜 − 𝑩 𝑢+1 𝒜 (𝑜) 𝑈 𝒚 𝑜 − 𝑩 𝑢+1 𝒜 (𝑜) 𝑂 diag 𝐹 𝑂 𝑂 𝒚 𝑜 𝒚 𝑜 𝑈 − 𝑩 𝑢+1 𝐹 𝒜 𝑜 𝒚 𝑜 𝑈 ) = diag( 𝑜=1 𝑜=1 17

Unidentifiability  𝑩 only appears as outer product 𝑩𝑩 𝑈 , thus the model is invariant to rotation and axis flips of the latent space.  𝑩 can be replaced with 𝑩𝑹 for any orthonormal matrix 𝑹 and the model containing only 𝑩𝑩 𝑈 remains the same.  Thus, FA is an un-identifiable model.  Likelihood objective function on a set of data will not have a unique maximum (an infinite number of parameters give the maximum score)  It not be guaranteed to identify the same parameters. 18

Probabilistic PCA & FA  Data is a linear function of low-dimensional latent coordinates, plus Gaussian noise  Factor analysis: 𝜴 is a general diagonal matrix  Probabilistic PCA: 𝜴 = 𝛽𝑱 𝑄 𝒜 = 𝒪(𝒜|𝟏, 𝑱) 𝒃 𝑄 𝒚 𝒜 = 𝒪 𝒚|𝑩𝒜 + 𝝂, 𝜴 𝑄 𝒚 = 𝒪 𝒚|𝝂, 𝑩𝑩 𝑼 + 𝜴 𝑨 𝒃 19 [Bishop]

PPCA Posterior mean is not an orthogonal projection, since it is shrunk somewhat towards the prior mean [Murphy] 20

Exact inference for Gaussian networks 21

Multivariate Gaussian distribution 2𝜌 𝑒/2 𝜯 1/2 exp{− 1 1 2 𝒚 − 𝝂 𝑈 𝜯 −1 (𝒚 − 𝝂)} 𝑄 𝒀 = 𝑄 𝒀 ∝ exp{− 1 2 𝒚 𝑈 𝑲𝒚 + 𝑲𝝂 𝑈 𝒚} 𝑲 = 𝜯 −1 𝑄 is normalizable (i.e., normalization constant is finite) and defines a legal Gaussian distribution  Directed model if and only if 𝑲 is positive definite.  Linear Gaussian model  Undirected model  Gaussian MRF 22

Linear-Gaussian model  Linear-Gaussian model for CPDs: 𝑄 𝑌 𝑗 𝑄𝑏 𝑌 𝑗 = 𝒪 𝑌 𝑗 | 𝑥 𝑗𝑘 𝑌 𝑘 + 𝑐 𝑗 , 𝑤 𝑗 𝑌 𝑘 ∈𝑄𝑏 𝑌 𝑗  The joint distribution is Gaussian: ln 𝑄 𝑌 1 , … , 𝑌 𝐸 2 𝐸 1 = − 𝑌 𝑗 − 𝑥 𝑗𝑘 𝑌 𝑘 − 𝑐 𝑗 + 𝐷 2𝑤 𝑗 𝑗=1 𝑌 𝑘 ∈𝑄𝑏 𝑌 𝑗 23

& Exact inference for Gaussian networks Probabilistic Graphical - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2016 Soleymani Multivariate Gaussian distribution 2 /2 1/2 exp{ 1 1 2

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Exact Inference Inference Basic task for inference: Compute

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

Scalable Exact Inference in Multi-Output Gaussian Processes Wessel P. Bruinsma 1 , 2 , Eric Perim

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Probability Review III Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline

& Exact inference for Gaussian networks Probabilistic Graphical - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2016 Soleymani Multivariate Gaussian distribution 2 /2 1/2 exp{ 1 1 2

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Factor analysis &amp; Exact inference for Gaussian networks Probabilistic Graphical Models

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Exact Inference Inference Basic task for inference: Compute

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

Scalable Exact Inference in Multi-Output Gaussian Processes Wessel P. Bruinsma 1 , 2 , Eric Perim

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

Parameter estimation (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University January 24, 2019

Gaussian Random Variables and Processes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department

An introduction to Gaussian processes Oliver Stegle and Karsten Borgwardt Machine Learning and

Machine Learning (CSE 446): Probabilistic Machine Learning MLE &amp; MAP Sham M Kakade 2018

MLE 04-09-2019 For Gaussian and Mixture Gaussian Models Instructor - Sriram Ganapathy

Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp

Probability Review III Harvard Math Camp - Econometrics Ashesh Rambachan Summer 2018 Outline

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models

Machine Learning (CSE 446): Probabilistic Machine Learning MLE & MAP Sham M Kakade 2018