Factor analysis & Exact inference for Gaussian networks - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani

Multivariate Gaussian distribution 2𝜌 𝑒/2 𝜯 1/2 exp{− 1 1 2 𝒚 − 𝝂 𝑈 𝜯 −1 (𝒚 − 𝝂)} 𝒪 𝒚|𝝂, 𝜯 =  The natural, canonical, or information parameterization of a Gaussian distribution arises from quadratic form 𝒪 𝒚| 𝒊 , 𝑲 ∝ exp{− 1 2 𝒚 𝑈 𝑲𝒚 + 𝒊 𝑈 𝒚} 𝚳 = 𝑲 = 𝜯 −1 𝒊 = 𝜯 −1 𝝂 2

Joint Gaussian distribution: block elements  If we partition the vector 𝒚 into 𝒚 1 and 𝒚 2 : 𝝂 = 𝝂 1 𝝂 2 𝑈 𝜯 21 = 𝜯 12 𝜯 = 𝜯 11 𝜯 12 𝜯 11 and 𝜯 22 are symmetric 𝜯 21 𝜯 22 𝒚 1 𝒚 1 𝝂 1 𝝂 2 , 𝜯 11 𝜯 12 𝑄 𝝂, 𝜯 = 𝒪 𝒚 2 𝒚 2 𝜯 21 𝜯 22 3

Marginal and conditional of Gaussian 𝑦 2 𝑦 2 = 0.7 𝑄(𝑦 1 |𝑦 2 = 0.7) 𝑄(𝑦 1 , 𝑦 2 ) 𝑄(𝑦 1 ) 𝑦 1 [Bishop] For multivariate Gaussian distribution, all marginal and conditional distributions are also Gaussians 4

Matrix inverse lemma −1 −𝑁𝐶𝐸 −1 𝐵 𝐶 𝑁 = 𝐸 −1 + 𝐸 −1 𝐷𝑁𝐶𝐸 −1 −𝐸 −1 𝐷𝑁 𝐷 𝐸 𝑁 = 𝐵 − 𝐶𝐸 −1 𝐷 −1 5

Precision matrix  In many situations, it will be convenient to work with 𝚳 = 𝜯 −1 known as precision matrix: −1 𝚳 = 𝜯 −1 = 𝜯 11 𝜯 12 𝑈 𝜯 21 𝜯 22 𝚳 21 = 𝚳 12 𝚳 = 𝚳 11 𝚳 12 𝚳 11 and 𝚳 22 are symmetric 𝚳 21 𝚳 22 Relation between the inverse of a partitioned matrix and the inverses of its partitions (using matrix inverse lemma): −1 𝜯 21 −1 𝚳 11 = 𝜯 11 − 𝜯 12 𝜯 22 −1 𝜯 21 −1 𝜯 12 𝜯 22 −1 𝚳 12 = − 𝜯 11 − 𝜯 12 𝜯 22 6

Marginal and conditional distributions based on block elements of 𝜯  Conditional distributions : 𝑄 𝒚 1 |𝒚 2 = 𝒪 𝒚 1 |𝝂 1|2 , 𝜯 1|2 −1 𝒚 2 − 𝝂 2 𝝂 1|2 = 𝝂 1 + 𝜯 12 𝜯 22 −1 𝜯 21 𝜯 1|2 = 𝜯 11 − 𝜯 12 𝜯 22  Marginal distributions based on block element of 𝝂 and 𝜯 : 𝑄 𝒚 1 = 𝒪 𝒚 1 |𝝂 1 , 𝜯 11 𝑄 𝒚 2 = 𝒪 𝒚 2 |𝝂 2 , 𝜯 22 8

Factor analysis  Gaussian latent variable 𝒂 ( 𝑀 dimensional)  Continuous latent variable  Can be used for dimensionality reduction 𝒂  Observed variable 𝒀 ( 𝑀 < 𝐸 dimensional) 𝝂, 𝑩, 𝛀 𝑄 𝒜 = 𝒪 𝒚 𝟏, 𝑱 𝒀 𝑄 𝒚|𝒜 = 𝒪 𝒚 𝝂 + 𝑩𝒜, 𝛀 𝒂 ∈ ℝ 𝑀 𝑩 : factor loading 𝐸 × 𝑀 matrix 𝒀 ∈ ℝ 𝐸 𝛀 : diagonal covariance matrix 9

Marginal distribution 𝒜 𝑄 𝒜 = 𝒪 𝒜 𝟏, 𝑱 𝒚 = 𝝂 + 𝑩𝒜 + 𝒙 𝑄 𝒚|𝒜 = 𝒪 𝒚 𝝂 + 𝑩𝒜, 𝛀 𝒙~𝒪(𝟏, 𝛀) 𝒙 is independent of 𝒚 and 𝒜 𝒚 The product of Gaussian distributions are Gaussian, as well as the marginal of Gaussian, thus 𝑄 𝒚 = 𝑄 𝒜 𝑄 𝒚|𝒜 𝑒𝒜 is Gaussian 𝝂 𝒚 = 𝐹 𝒚 = 𝐹 𝝂 + 𝑩𝒜 + 𝒙 = 𝝂 + 𝑩𝐹 𝒜 = 𝝂 𝒚 − 𝝂 𝑈 = 𝐹 𝑩𝒜 + 𝒙 𝑈 𝜯 𝒚𝒚 = 𝐹 𝒚 − 𝝂 𝑩𝒜 + 𝒙 = 𝑩𝐹 𝒜𝒜 𝑈 𝑩 𝑈 + 𝛀 = 𝑩𝑩 𝑈 + 𝛀 ⇒ 𝑄 𝒚 = 𝒪 𝒚|𝝂 , 𝑩𝑩 𝑈 + 𝛀 10

Joint Gaussian distribution 𝜯 𝒜𝒚 = 𝐷𝑝𝑤 𝒜, 𝒚 = 𝐹 𝒜 𝑩𝒜 + 𝒙 𝑈 = 𝑩 𝑈 𝜯 𝒚𝒚 = 𝑩𝑩 𝑈 + 𝜴 𝜯 𝒜𝒜 = 𝑱 𝑱 𝑩 ⇒ 𝜯 = 𝑩𝑩 𝑈 + 𝜴 𝑩 𝑈 𝐹 𝒜 𝒚 = 𝟏 𝝂 𝒜 𝒜 𝟏 𝑱 𝑩 ⇒ 𝑄 = 𝒪 𝝂 , 𝑩𝑩 𝑈 + 𝜴 𝑩 𝑈 𝒚 𝒚 11

𝑄 𝒚 1 |𝒚 2 = 𝒪 𝒚 1 |𝝂 1|2 , 𝜯 1|2 −1 𝒚 2 − 𝝂 2 𝝂 1|2 = 𝝂 1 + 𝜯 12 𝜯 22 Conditional distributions −1 𝜯 21 𝜯 1|2 = 𝜯 11 − 𝜯 12 𝜯 22 𝑄 𝒜|𝒚 = 𝒪 𝒜|𝝂 𝒜|𝒚 , 𝚻 𝒜|𝒚 𝑱 𝑩 𝜯 = 𝝂 𝒜|𝒚 = 𝑩 𝑈 𝑩𝑩 𝑈 + 𝛀 −1 𝒚 − 𝝂 𝑩𝑩 𝑈 + 𝜴 𝑩 𝑈 𝚻 𝒜|𝒚 = 𝑱 − 𝑩 𝑈 𝑩𝑩 𝑈 + 𝛀 −1 𝑩  𝐸 × 𝐸 matrix is required to be inverted. If 𝑀 < 𝐸 , it is preferred to use: 𝝂 𝒜|𝒚 = 𝑱 + 𝑩 𝑈 𝛀 −1 𝑩 −1 𝑩 𝑈 𝛀 −1 𝒚 − 𝝂 = 𝚻 𝒜|𝒚 𝑩 𝑈 𝛀 −1 𝒚 − 𝝂 𝚻 𝒜|𝒚 = 𝑱 + 𝑩 𝑈 𝛀 −1 𝑩 −1 Posterior covariance does not depend on observed data 𝒚 ! Computing the posterior mean is a linear operation 12 𝐵 − 𝐶𝐸 −1 𝐷 −1 = 𝐵 −1 + 𝐵 −1 𝐶 𝐸 − 𝐷𝐵 −1 𝐶 −1 𝐷𝐵 −1

Geometric illustration 𝒚 𝑦 3 𝑄(𝒜|𝒚) [Jordan] 𝑄(𝒜) 𝑦 2 𝑦 1 To generate data, first generate a point within the manifold then add noise. 13

FA example  Data is a linear function of low-dimensional latent coordinates, plus Gaussian noise 𝑄 𝒜 = 𝒪(𝒜|𝟏, 𝑱) 𝒃 𝑄 𝒚 𝒜 = 𝒪 𝒚|𝑩𝒜 + 𝝂, 𝜴 𝑄 𝒚 = 𝒪 𝒚|𝝂, 𝑩𝑩 𝑼 + 𝜴 𝑨 𝒃 [Bishop] 14

Factor analysis: dimensionality reduction  FA is just a constrained Gaussian model  If 𝜴 were not diagonal then we could model any Gaussian  FA is a low rank parameterization of a multi-variate Gaussian  Since 𝑄 𝒚 = 𝒪 𝒚|𝝂 , 𝑩𝑩 𝑈 + 𝛀 , FA approximates the covariance matrix of the visible vector using a low-rank decomposition 𝑩𝑩 𝑈 and the diagonal matrix 𝛀  𝑩𝑩 𝑈 + 𝛀 is the outer product of two low-rank matrices plus a diagonal matrix (i.e., 𝑃(𝑀𝐸) parameters instead of 𝑃(𝐸 2 ) )  Given {𝒚 1 , … , 𝒚 (𝑂) } (the observation on high dimensional data), by learning from incomplete data we find 𝑩 for transforming data to a lower dimensional space 15

Incomplete likelihood ℓ 𝜾; 𝒠 𝑂 = − 𝑂 2 log 𝑩𝑩 𝑈 + 𝜴 − 1 𝑈 𝑩𝑩 𝑈 + 𝜴 −1 𝒚 𝑜 − 𝝂 𝒚 𝑜 − 𝝂 2 𝑜=1 = − 𝑂 2 log 𝑩𝑩 𝑈 + 𝜴 − 1 2 𝑢𝑠 𝑩𝑩 𝑈 + 𝜴 −1 𝑻 𝑂 𝑈 𝒚 𝑜 − 𝝂 𝒚 𝑜 − 𝝂 𝑻 = 𝑜=1 𝑂 𝝂 𝑁𝑀 = 1 𝒚 𝑜 𝑂 𝑜=1 16

E-step: expected sufficient statistics 𝑂 𝐹 𝑄 𝒜 (𝑜) |𝒚 (𝑜) ,𝜾 𝑢 log 𝑄 𝒜 (𝑜) 𝜾 + log 𝑄 𝒚 (𝑜) 𝒜 (𝑜) , 𝜾 𝐹 𝑄 ℋ|𝒠,𝜾 𝑢 log 𝑄 𝒠, ℋ 𝜾 = 𝑜=1  Expected sufficient statistics: 𝑂 = − 𝑂 2 log 𝜴 − 1 𝑢𝑠 𝐹 𝒜 𝑜 𝒜 𝑜 𝑈 𝐹 log 𝑄 𝒠, ℋ 𝜾 2 𝑜=1 𝑂 − 1 𝑈 𝜴 −1 + 𝑑 𝒚 𝑜 − 𝑩𝒜 𝑜 𝒚 𝑜 − 𝑩𝒜 𝑜 2 𝑢𝑠 𝐹 𝑜=1 𝒚 𝑜 − 𝑩𝒜 (𝑜) 𝑈 𝒚 𝑜 − 𝑩𝒜 (𝑜) 𝐹 = 𝒚 𝑜 𝒚 𝑜 𝑈 − 𝑩𝐹 𝒜 𝑜 𝒚 𝑜 − 𝒚 𝑜 𝐹 𝒜 𝑜 𝑈 𝑩 𝑈 + 𝑩𝐹 𝒜 (𝑜) 𝒜 𝑜 𝑈 𝑩 𝑈 = 𝝂 𝒜|𝒚 (𝑜) = 𝚻 𝒜|𝒚 (𝑜) 𝑩 𝑈 𝜴 −1 𝒚 𝑜 − 𝝂 𝐹 𝑄 𝒜 (𝑜) |𝒚 (𝑜) ,𝜾 𝑢 𝒜 𝑜 𝒜 (𝑜) 𝒜 𝑜 𝑈 = 𝚻 𝒜|𝒚 (𝑜) + 𝝂 𝒜|𝒚 (𝑜) 𝝂 𝒜|𝒚 𝑜 𝑈 𝐹 𝑄 𝒜 (𝑜) |𝒚 (𝑜) ,𝜾 𝑢 𝝂 𝒜|𝒚 = 𝚻 𝒜|𝒚 𝑩 𝑈 𝛀 −1 𝒚 − 𝝂 17 𝚻 𝒜|𝒚 = 𝑱 + 𝑩 𝑈 𝛀 −1 𝑩 −1

M-Step −1 𝑂 𝑂 𝑈 𝐹 𝒜 𝑜 𝒜 𝑜 𝑈 𝑩 𝑢+1 = 𝒚 𝑜 𝐹 𝒜 𝑜 𝑜=1 𝑜=1 𝜴 𝑢+1 = 1 𝒚 𝑜 − 𝑩 𝑢+1 𝒜 (𝑜) 𝑈 𝒚 𝑜 − 𝑩 𝑢+1 𝒜 (𝑜) 𝑂 diag 𝐹 𝑂 𝑂 𝒚 𝑜 𝒚 𝑜 𝑈 − 𝑩 𝑢+1 𝐹 𝒜 𝑜 𝒚 𝑜 𝑈 ) = diag( 𝑜=1 𝑜=1 18

Unidentifiability  𝑩 only appears as outer product 𝑩𝑩 𝑈 , thus the model is invariant to rotation and axis flips of the latent space.  𝑩 can be replaced with 𝑩𝑹 for any orthonormal matrix 𝑹 and the model containing only 𝑩𝑩 𝑈 remains the same.  Thus, FA is an un-identifiable model.  Likelihood objective function on a set of data will not have a unique maximum (an infinite number of parameters give the maximum score)  It not be guaranteed to identify the same parameters. 19

Probabilistic PCA (PPCA)  Factor analysis: 𝜴 is a general diagonal matrix  Probabilistic PCA: 𝜴 = 𝛽𝑱 and 𝑩 is orthogonal PCA Posterior mean is not an orthogonal projection, since it is shrunk somewhat towards the prior mean 20 [Murphy]

Exact inference for Gaussian networks 21

Multivariate Gaussian distribution 2𝜌 𝑒/2 𝜯 1/2 exp{− 1 1 2 𝒚 − 𝝂 𝑈 𝜯 −1 (𝒚 − 𝝂)} 𝑄 𝒀 = 𝑄 𝒀 ∝ exp{− 1 2 𝒚 𝑈 𝑲𝒚 + 𝑲𝝂 𝑈 𝒚} 𝑲 = 𝜯 −1 𝑄 is normalizable (i.e., normalization constant is finite) and defines a legal Gaussian distribution  Directed model if and only if 𝑲 is positive definite.  Linear Gaussian model  Undirected model  Gaussian MRF 22

Factor analysis & Exact inference for Gaussian networks - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Multivariate Gaussian distribution 2 /2 1/2 exp{ 1 1 2

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

& Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Exact Inference Inference Basic task for inference: Compute

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Week 7 Video 5 Factor Analysis Factor Analysis You have a whole lot of variables Can

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Covariant Brackets in Field Theories and Particle Dynamics A. Ibort ICMAT & Department of

Diagram Rewriting: Examples and Theory Yves Lafont CNRS Institut de Mathmatiques de Luminy

Plan of the Lecture Review: state-space notions: canonical forms, controllability.

First Order Logic: Prenex normal form. Skolemization. Clausal form Valentin Goranko DTU

On Canonical Forms for Frequent Graph Mining Christian Borgelt School of Computer Science

New Hiera rchies of Rep resentations RM'97, Septemb er 1997 1 ' $ NEW HIERARCHIES OF

Lecture Outline Systeem- en Regeltechniek II Previous lecture: State-space models,

Semi-analytical computation of Normal Forms, Centre Manifolds and First Integrals of Hamiltonian

Factor analysis & Exact inference for Gaussian networks - PowerPoint PPT Presentation

Factor analysis & Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Multivariate Gaussian distribution 2 /2 1/2 exp{ 1 1 2

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

&amp; Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Triadic Factor Analysis Cynthia Glodeanu Institute of Algebra, TU Dresden October 19, 2010.

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CS 730/730W/830: Intro AI Bayesian Networks Approx. Inference Exact Inference 1 handout: slides

CS 730/830: Intro AI Bayesian Networks Approx. Inference Exact Inference Wheeler Ruml (UNH)

Inference in Bayesian networks Chapter 14.45 Chapter 14.45 1 Outline Exact inference

Confirmatory Factor Analysis and Exploratory-Confirmatory Factor Analysis Maximum

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

Exact Inference Inference Basic task for inference: Compute

Algebraic Tools for Exact Geometric Computing I - Exact Arithmetic and Filtering Michael Hemmer

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Week 7 Video 5 Factor Analysis Factor Analysis You have a whole lot of variables Can

Variable Elimination 1 Inference Exact inference Enumeration Variable elimination

Covariant Brackets in Field Theories and Particle Dynamics A. Ibort ICMAT &amp; Department of

Diagram Rewriting: Examples and Theory Yves Lafont CNRS Institut de Mathmatiques de Luminy

Plan of the Lecture Review: state-space notions: canonical forms, controllability.

First Order Logic: Prenex normal form. Skolemization. Clausal form Valentin Goranko DTU

On Canonical Forms for Frequent Graph Mining Christian Borgelt School of Computer Science

New Hiera rchies of Rep resentations RM'97, Septemb er 1997 1 ' $ NEW HIERARCHIES OF

Lecture Outline Systeem- en Regeltechniek II Previous lecture: State-space models,

Semi-analytical computation of Normal Forms, Centre Manifolds and First Integrals of Hamiltonian

& Exact inference for Gaussian networks Probabilistic Graphical Models Sharif University of

Covariant Brackets in Field Theories and Particle Dynamics A. Ibort ICMAT & Department of