Outline Latent Variable Generative Models Cooperative Vector - PDF document

2/4/2020 CS 3750 Advanced Machine Learning Latent Variable Generative Models II Ahmad Diab AHD23@cs.pitt.edu Feb 4, 2020 Based on slides of Professor Milos Hauskrecht Outline • Latent Variable Generative Models • Cooperative Vector Quantizer Model • Model Formulation • Expectation Maximization (EM) • Variational Approximation • Noisy-OR Component Analyzer • Model Formulation • Variational EM for NOCA • References 1

2/4/2020 Latent Variable Generative Models • Generative Models: Unsupervised learning models that study the underlying structure (e.g. interesting patterns) and causal structures of data to generate data like it. • Latent (hidden) variables are random variables that are hard to observe. (ex. Length is measured, but intelligence is not), and is assumed to affect the response variable. • The idea: introduce an unobserved latent variable, S, and use it to generate a traceable, less complex distribution. p(x, s) = p(x | s) p(s) p(x) Complex Distribution Simpler Distribution Latent Variable Generative Models • Assumption: Observable variables are independent given latent variables. S 1 S 2 S q . . . . . . x d-1 x d x 1 x 2 2

2/4/2020 Cooperative Vector Quantizer (CVQ) • Latent variables (s): Binary vars with Dimensionality k • Observed variables (x): real valued vars Dimensionality d S 1 S 2 S k . . . . . . x d-1 x 1 x 2 x d CVQ – Model Description S: k binary vars • Model … 𝐿 • 𝑦 = σ 𝑙=1 𝑡 𝑙 𝑥 𝑙 + 𝜗 • Latent variables 𝑡 𝑗 • ~ Bernoulli distribution parameter: 𝜌 𝑗 • 𝑄(𝑡 𝑗 | 𝜌 𝑗 ) = 𝜌 𝑗𝑡 𝑗 (1 − 𝜌 𝑗 ) 1−𝑡 𝑗 X: d real valued vars • 𝑥 𝑙 is the weight output by source 𝑡 𝑙   w w .. w • Observable variables 𝑦   11 12 1 k   w • ~ Normal distributions parameters: W, Σ  21 W   .. • 𝑄(𝑦 | 𝑡 ) = N(Ws, Σ ),      .. ..  • we assume Σ = 𝜏𝐽 w w 1 d dk • Joint for one instance of s and x 2𝜏 2 (𝑦 − 𝑋𝑡) 𝑈 (𝑦 − 𝑋𝑡) ς 𝑗=1 1 • 𝑞 𝑦, 𝑡 Θ) = 2 −𝑒/2 𝜏 −𝑒/2 exp{ − 𝑙 𝜌 𝑗𝑡 𝑗 (1 − 𝜌 𝑗 ) 1−𝑡 𝑗 6 3

2/4/2020 CVQ – Model Description • Objective: to learn parameters of the model: W, π, σ • If both x and s are observable, • Use loglikelihood: 𝑂 𝑚𝑝𝑕𝑄(𝑦 𝑜 , 𝑡 𝑜 |Θ) = ෍ 𝑜=1 (𝑜) 𝑚𝑝𝑕 𝜌 𝑗 2𝜏 2 (𝑦 𝑜 − 𝑋𝑡 𝑜 ) 𝑈 (𝑦 𝑜 − 𝑋𝑡 𝑜 ) + σ 𝑗=1 1 𝑂 𝑙 σ 𝑜=1 −𝑒 𝑚𝑝𝑕 𝜏 − 𝑡 𝑗 (𝑜) )log(1 − 𝜌 𝑗 ) + c (1 − 𝑡 𝑗 • Solution is nice and easy 7 CVQ – Model Description • Objective: to learn parameters of the model: W, π, σ • If only x are observable • Log likelihood of data: 𝑚𝑝𝑕𝑄(𝑦 𝑜 |Θ) = σ 𝑜=1 𝑚𝑝𝑕 σ {𝑡 𝑜 } 𝑄 (𝑦 𝑜 , 𝑡 𝑜 |Θ) 𝑂 𝑂 𝑚𝑝𝑕𝑄 𝐸 Θ = σ 𝑜=1 • Solution is hard, we can no longer benefit from the decomposition. • Use Expectation Maximization (EM). 8 4

2/4/2020 Variational Approximation • An alternative method to approximate inference based on stochastic sampling. • Let H be a set of all variables with hidden or missing values • log 𝑄 ( 𝐸 | Θ , 𝜊 ) = log 𝑄 ( 𝐼 , 𝐸 | Θ , 𝜊 ) − log 𝑄 ( 𝐼 | 𝐸 , Θ , 𝜊 ) • Average both sides using a distribution 𝑅(𝐼 | 𝜇) [ surrogate posterior ] 𝐹 𝐼|𝜇 𝑚𝑝𝑕𝑄 𝐸 Θ, 𝜊 = 𝐹 𝐼|𝜇 𝑚𝑝𝑕𝑄(𝐼, 𝐸|Θ, 𝜊) − 𝐹 𝐼|𝜇 𝑚𝑝𝑕𝑅(𝐼 |𝜇) +𝐹 𝐼|𝜇 𝑚𝑝𝑕𝑅(𝐼 |𝜇) − 𝐹 𝐼|𝜇 𝑚𝑝𝑕𝑄(𝐼|Θ, 𝜊) log𝑄(𝐸|𝛪, 𝜊) = 𝐺(𝑅, 𝛪) + 𝐿𝑀(𝑅, 𝑄 ) 𝐺(𝑅, Θ) = Σ {𝐼} 𝑅(𝐼 |𝜇)𝑚𝑝𝑕𝑄(𝐼, 𝐸|Θ, 𝜊) − Σ {𝐼} 𝑅(𝐼 |𝜇)𝑚𝑝𝑕𝑅(𝐼 |𝜇) 𝐿𝑀(𝑅, 𝑄) = Σ {𝐼} 𝑅(𝐼 |𝜇)[𝑚𝑝𝑕𝑅(𝐼 |𝜇) − 𝑚𝑝𝑕𝑄(𝐼 |𝐸, Θ)] 11 Variational Approximation ) log𝑄(𝐸|𝛪, 𝜊) = 𝐺(𝑅, 𝛪) + 𝐿𝑀(𝑅, 𝑄 𝐺(𝑅, Θ) = Σ {𝐼} 𝑅(𝐼 |𝜇)𝑚𝑝𝑕𝑄(𝐼, 𝐸|Θ, 𝜊) − Σ {𝐼} 𝑅(𝐼 |𝜇)𝑚𝑝𝑕𝑅(𝐼 |𝜇) 𝐿𝑀(𝑅, 𝑄) = Σ {𝐼} 𝑅(𝐼 |𝜇)[𝑚𝑝𝑕𝑅(𝐼 |𝜇) − 𝑚𝑝𝑕𝑄(𝐼 |𝐸, Θ)] • Approximation: maximize 𝐺(𝑅, Θ) • Parameters: Θ, 𝜇 • Maximization of F pushes up the lower bound on the log-likelihood log 𝑄 𝐸 Θ, 𝜊 ≥ 𝐺 𝑅, Θ . 6

2/4/2020 Kullback-Leibler (KL) divergence • A method to measure the difference between two probability distributions over the same variable x • 𝐿𝑀(𝑄 || 𝑅) • Where the “||” operator indicates “ divergence ” or P’s divergence from Q • Entropy: the average amount of information for a probability distribution 𝑜 • 𝐼 𝑄 = 𝐹 𝑄 𝐽 𝑄 𝑌 = − σ 𝑗=1 𝑄 𝑗 log(𝑄 𝑗 ) 𝑄(𝑗) 𝑜 𝑜 𝑜 • 𝐿𝑀 𝑄 ||𝑅 = 𝐼 𝑄, 𝑅 − 𝐼 𝑄 = − σ 𝑗=1 𝑄 𝑗 log 𝑅 𝑗 + σ 𝑗=1 𝑄 𝑗 log(𝑄 𝑗 ) = σ 𝑗=1 𝑄 𝑗 log( 𝑅(𝑗) ) • If we have some theoretic minimal distribution P, we want to try to find an approximation Q that tries to get as close as possible by minimizing the KL divergence 13 Variational EM • To use Variational EM, we hope if we choose 𝑅(𝐼 | 𝜇) well, the optimization of both 𝜇 and Θ will become easy. • A well-behaved choice for 𝑅(𝐼 | 𝜇) is the mean field approximation . • Let H – be a set of all variables with hidden or missing values: • E-step: Compute expectation over hidden variables • Optimize: 𝐺(𝑅, Θ) with respect to 𝜇 while keeping Θ fixed. • M-step: Maximize expected loglikelihood • Optimize: 𝐺 (𝑅 , Θ) with respect to Θ while keeping 𝜇𝑡 fixed. 14 7

2/4/2020 Mean Field Approximation • To find the distribution Q, we use Mean Field Approximation • Assumption: • 𝑅(𝐼|𝜇) is the mean field approximation • Variables in the 𝑅(𝐼) distribution are independent variables 𝐼 𝑗 • Q is completely factorized 𝑅(𝐼|𝜇) = ς𝑅 𝑗 (𝐼 𝑗 |𝜇 𝑗 ) • For our CVQ model • Hidden variables are binary sources 𝑅(𝑡 𝑜 |𝜇 𝑜 ) 𝑅(𝐼|𝜇) = ෑ 𝑜=1…𝑂 𝑜 |𝜇 𝑗 𝑅(𝑡 𝑜 |𝜇 𝑜 ) = ς 𝑗=1…𝑙 𝑅(𝑡 𝑗 (𝑜) ) 𝑜 𝑡 𝑗 𝑜 𝑜 𝜇 𝑗 𝑜 ) 1 −𝑡 𝑗 𝑜 𝑜 𝑅 𝑡 𝑗 = 𝜇 𝑗 (1 − 𝜇 𝑗 15 Mean Field Approximation • Functional F for the mean field: ) ) 𝐺(𝑅, 𝛪) = ෍ 𝑅(𝐼|𝜇 log𝑄(𝐼, 𝐸|𝛪, 𝜊) − ෍ 𝑅(𝐼|𝜇 log𝑅(𝐼|𝜇) 𝐼 𝐼 • Assume just one data point x and corresponding s : 𝑂 𝑚𝑝𝑕𝑄((𝑦 𝑜 , 𝑡 𝑜 |Θ) 𝑅(𝑡 𝑜 |𝜇 𝑜 ) − 𝑚𝑝𝑕𝑅 𝑡 𝑜 𝜇 𝑜 𝐺 𝑅, Θ = ෍ 𝑅 𝑡 𝑜 𝜇 𝑜 𝑜=1 1 2𝜏 2 𝐲 − 𝐗𝐭 𝑈 (𝐲 − 𝐗𝐭 = −𝑒log𝜏 − ቁ (1) ) 𝑅(𝑡|𝜇 𝑙 + ෌ 𝑗=1 𝑡 𝑗 log𝜌 𝑗 + (1 − 𝑡 𝑗 )log(1 − 𝜌 𝑗 ) (2) ) 𝑅(𝑡|𝜇 𝑙 ) − ෌ 𝑗=1 𝑡 𝑗 log𝜇 𝑗 + (1 − 𝑡 𝑗 )log(1 − 𝜇 𝑗 (3) 𝑅(𝑡|𝜇 ) 16 8

Outline Latent Variable Generative Models Cooperative Vector - PDF document

2/4/2020 CS 3750 Advanced Machine Learning Latent Variable Generative Models II Ahmad Diab AHD23@cs.pitt.edu Feb 4, 2020 Based on slides of Professor Milos Hauskrecht Outline Latent Variable Generative Models Cooperative Vector

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Modeling nonignorable missingness in multidimensional latent class IRT models Silvia Bacci 1 ,

Convergence of latent mixing measures in finite and infinite mixture models Long Nguyen

CSC421/2516 Lecture 17: Variational Autoencoders Roger Grosse and Jimmy Ba Roger Grosse and

Unsupervised learning: latent space analysis and clustering Yifeng Tao School of Computer

E ffi cient Modeling of Latent Information in Supervised Learning using Gaussian Processes

Neural Discrete Representation Learning (VQ-VAE) Aaron van den Oord, Oriol Vinyals, Koray

La Latent-sp space Dynam Dynamics ics for r Re Reduced Deformable Simulation Lawson Fulton

Maximum Reconstruction Estimation for Generative Latent-Variable Models Yong Cheng joint work