RBM, DBN, and DBM M. Soleymani Sharif University of Technology - PowerPoint PPT Presentation

RBM, DBN, and DBM M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Salakhutdinov lectures, CMU 2017 and Hugo Larochelle ’ s class on Neural Networks: https://sites.google.com/site/deeplearningsummerschool2016/.

Energy based models • Gibbs distribution 2

Boltzmann Machine

How a causal model generates data • In a causal model we generate data in two sequential steps: – First pick the hidden states from p(h). – Then pick the visible states from p(v|h) • The probability of generating a visible vector, v, is computed by summing over all possible hidden states. This slide has been adopted from Hinton lectures, “ Neural Networks for Machine Learning ” , coursera, 2015.

How a Boltzmann Machine generates data • It is not a causal generative model. • Instead, everything is defined in terms of the energies of joint configurations of the visible and hidden units. • The energies of joint configurations are related to their probabilities – We can simply define the probability to be 𝑞 𝑤, ℎ = 𝑓 −𝐹 𝑤,ℎ

Restricted Boltzmann Machines A Restricted Boltzmann Machine (RBM) is an undirected graphical model with hidden and visible layers. 𝐹 𝑤, ℎ = −𝑤 𝑈 𝑋ℎ − 𝑑 𝑈 𝑤 − 𝑐 𝑈 ℎ = − 𝑥 𝑗𝑘 𝑤 𝑗 ℎ 𝑘 − 𝑑 𝑗 𝑤 𝑗 − 𝑐 𝑘 ℎ 𝑘 𝑗,𝑘 𝑗 𝑘 Learnable parameters are 𝑐, 𝑑 which are linear weight vectors for 𝑤, ℎ and 𝑋 which models interaction between them.

Restricted Boltzmann Machines All hidden units are conditionally independent given the visible units and vice versa. 7

Probabilistic Analog of Autoencoder • Autoencoder

RBM: Image input 𝒘 𝒘

MNIST: Learned features Larochelle et al., JMLR 2009

Restricted Boltzmann Machines 𝐹 𝑤, ℎ = −𝑤 𝑈 𝑋ℎ − 𝑑 𝑈 𝑤 − 𝑐 𝑈 ℎ = − 𝑥 𝑗𝑘 𝑤 𝑗 ℎ 𝑘 − 𝑑 𝑗 𝑤 𝑗 − 𝑐 𝑘 ℎ 𝑘 𝑗,𝑘 𝑗 𝑘 The effect of the latent variables can be appreciated by considering the marginal distribution over the visible units: 12

Marginal distribution 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘

Marginal distribution 𝒘 𝒘 𝒘 𝒘 𝒘 𝒘

Model Learning

RBM learning: Stochastic gradient descent 𝜖𝜄 log 𝑄(𝒘 (𝑜 )) = 𝜖 𝜖 exp 𝑤 (𝑜)𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 (𝑜) + 𝑐 𝑈 ℎ − 𝜖 𝜖𝜄 log 𝜖𝜄 log 𝑎 𝒘 (𝑜) 𝒘 𝒘 (𝑜) 𝒘 ℎ Positive phase Negative phase • Second term: intractable due to exponential number of configurations. exp 𝑤 𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 + 𝑐 𝑈 ℎ 𝑎 = 𝑤 ℎ

Positive phase 𝜖 exp 𝑤 (𝑜) 𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 (𝑜) + 𝑐 𝑈 ℎ 𝜖𝑋 log ℎ 𝜖 𝜖𝑋 ℎ exp 𝑤 (𝑜)𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 (𝑜) + 𝑐 𝑈 ℎ = ℎ exp 𝑤 (𝑜) 𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 (𝑜) + 𝑐 𝑈 ℎ ℎ ℎ𝑤 𝑜 𝑈 exp 𝑤 (𝑜)𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 (𝑜) + 𝑐 𝑈 ℎ = ℎ exp 𝑤 (𝑜)𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 (𝑜) + 𝑐 𝑈 ℎ ℎ𝑤 𝑜 𝑈 = 𝐹 ℎ~𝑞 𝑤 𝑜 ,ℎ

RBM learning: Stochastic gradient descent Maximize with respect to 𝜖 log 𝑄(𝑤 (𝑜 )) = 𝐹 ℎ 𝑘 𝑤 𝑗 ℎ 𝑘 |𝑤 = 𝑤 (𝑜) − 𝐹 𝑤 𝑗 ,ℎ 𝑘 𝑤 𝑗 ℎ 𝑘 𝜖𝑋 𝑗𝑘 𝜖 log 𝑄(𝑤 (𝑜 )) = 𝐹 ℎ 𝑘 ℎ 𝑘 |𝑤 = 𝑤 (𝑜) − 𝐹 ℎ 𝑘 ℎ 𝑘 𝜖𝑐 𝑘 𝜖 log 𝑄(𝑤 (𝑜 )) = 𝐹 𝑤 𝑗 𝑤 𝑗 |𝑤 = 𝑤 (𝑜) − 𝐹 𝑤 𝑗 𝑤 𝑗 𝜖𝑑 𝑗 20

RBM learning: Stochastic gradient descent 𝜖 log 𝑄(𝑤 (𝑜 )) = 𝐹 𝑤 𝑗 ℎ 𝑘 |𝑤 = 𝑤 (𝑜) − 𝐹 𝑤 𝑗 ℎ 𝑘 𝜖𝑋 𝑗𝑘 Positive statistic 𝐹 𝑤 𝑗 ℎ 𝑘 |𝑤 = 𝑤 (𝑜) = 𝐹 ℎ 𝑘 |𝑤 = 𝑤 (𝑜) 𝑤 𝑗 𝑜 𝑜 𝑤 𝑗 = 𝑜 + 𝑐 1 + exp − 𝑗 𝑋 𝑗𝑘 𝑤 𝑗 𝑘 • Note that to compute 𝐹 𝑤 𝑗 ℎ 𝑘 (negative statistic) we ideally need to integrate (however, a sampler over time can be used to get an estimate of gradients).

Approximate Learning • Replace the average over all possible input configurations by samples. 𝐹 𝒘,𝒊 − 𝜖𝐹 𝒊, 𝒘 𝑞 𝒊, 𝒘 𝒊𝒘 𝑈 = 𝜖𝜄 𝒊,𝒘 • Run MCMC chain (Gibbs sampling) starting from the observed examples.

Model Learning

RBM learning: Contrastive divergence Getting an unbiased sample of the second term is very difficult. It can be done by starting at any random state of the visible units and performing Gibbs sampling for a very long time. Block-Gibbs MCMC Initialize v0 = v Sample h0 from P(h|v0) For t=1:T Sample vt from P(v|ht-1) Sample ht from P(h|v t ) 25

Negative statistic 𝑁 𝐹[𝑤 𝑗 ℎ 𝑘 ] ≈ 1 (𝑛) ℎ 𝑘 (𝑛) 𝑁 𝑤 𝑗 𝑛=1 𝑤 (𝑛) ℎ (𝑛) ~𝑄 𝑤, ℎ • Initializing N independent Markov chain each at a data point and running until convergence: 𝑂 𝐹[𝑤 𝑗 ℎ 𝑘 ] ≈ 1 𝑜 ,𝑈 ℎ 𝑘 𝑜 ,𝑈 𝑂 𝑤 𝑗 𝑜=1 𝑤 𝑜 ,0 = 𝑤 𝑜 ℎ 𝑜 ,𝑙 ~𝑄(ℎ|𝑤 = 𝑤 𝑜 ,𝑙 ) for 𝑙 ≥ 0 𝑤 𝑜 ,𝑙 ~𝑄(𝑤|ℎ = ℎ 𝑜 ,𝑙−1 ) for 𝑙 ≥ 1

CD-k Algorithm • CD-k: contrastive divergence with k iterations of Gibbs sampling • In general, the bigger k is, the less biased the estimate of the gradient will be • In practice, k=1 works well for learning good features and for pre- training

RBM inference: Block-Gibbs MCMC 29

CD-k Algorithm • Repeat until stopping criteria – For each training sample 𝑤 (𝑜) • Generate a negative sample 𝑤 using k steps of Gibbs sampling starting at the point 𝑤 𝑜 • Update model parameters: 𝑤 𝑜 𝑈 − ℎ 𝜖 log 𝑞 𝑤 (𝑜) = 𝑋 + 𝛽 ℎ 𝑤 𝑜 𝑤 𝑈 • 𝑋 ← 𝑋 + 𝛽 𝑤 𝜖𝑋 • 𝑐 ← 𝑐 + 𝛽 ℎ 𝑤 𝑜 − ℎ 𝑤 • 𝑑 ← 𝑑 + 𝛽 𝑤 𝑜 − 𝑤

Positive phase vs. negative phase

Contrastive Divergence Since convergence to a final distribution takes time, good initialization can speeds things up dramatically. Contrastive divergence uses a sample image to initialize the visible weights, then runs Gibbs sampling for a few iterations (even k = 1) – not to “ equilibrium. ” This gives acceptable estimates of the expected values in the gradient update formula.

MNIST: Learned features Larochelle et al., JMLR 2009

Tricks and Debugging • Unfortunately, it is not easy to debug training RBMs (e.g. using gradient checks) • We instead rely on approximate ‘‘ tricks ’’ 𝑤 (𝑜) − 𝑤 – we plot the average stochastic reconstruction and see if it tends to decrease – for inputs that correspond to image, we visualize the connection coming into each hidden unit as if it was an image – gives an idea of the type of visual feature each hidden unit detects – we can also try to approximate the partition function Z and see whether the (approximated) NLL decreases Salakhutdinov, Murray, ICML 2008.

RBM inference • Block-Gibbs MCMC 35

Gaussian Bernoulli RBMs

Gaussian Bernoulli RBMs • Let x represent a real-valued (unbounded) input, add a quadratic term to the energy function 𝐹 𝑤, ℎ = 𝑤 𝑈 𝑋ℎ + 𝑑 𝑈 𝑤 + 𝑐 𝑈 ℎ + 1 2 𝑤 𝑈 𝑤 • In this case 𝑞 𝑤|ℎ becomes a Gaussian distribution with mean 𝜈 = 𝑑 + 𝑋 𝑈 ℎ and identity covariance matrix • recommend to normalize the training set by: – subtracting the mean of each input – dividing each input by the training set standard deviation • should use a smaller learning rate than in the regular RBM

Gaussian Bernoulli RBMs

Deep Belief Networks • one of the first non-convolutional models to successfully admit training of deep architectures (2007)

Pre-training • We will use a greedy, layer-wise procedure • Train one layer at a time with unsupervised criterion • Fix the parameters of previous hidden layers • Previous layers viewed as feature extraction

Pre-training • Unsupervised Pre-training – first layer: find hidden unit features that are more common in training inputs than in random inputs – second layer: find combinations of hidden unit features that are more common than random hidden unit features – third layer: find combinations of combinations of ... • Pre-training initializes the parameters in a region from which we can reach a better parameters

RBM, DBN, and DBM M. Soleymani Sharif University of Technology - PowerPoint PPT Presentation

RBM, DBN, and DBM M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Salakhutdinov lectures, CMU 2017 and Hugo Larochelle s class on Neural Networks: https://sites.google.com/site/deeplearningsummerschool2016/.

HISTORY OF DBM HISTORY 1936-64- THE BEGINNING MAURITIUS AGRICULTURAL BANK (MAB) Providing

Age nda Que s t i ons on pha s e 2 of t he pr oj e c t Today: DBM S i

Practjcal, Powerful Presentatjon Skills JHB, CPT and DBN R 6, 675 Ex VAT 2 Days E - Learning

Graphical Models Kalman Filter DBN ML 701 Undirected Models Anna Goldenberg

Dynamic Bayesian network (DBN) HMM defined by Transition model P(X (t+1) |X (t) )

2 October 2019 WHO Malaria Policy and Advisory Committee Presentation by Dr Abdourahmane Diallo

Risk-Based Monitoring (RBM): Opportunities and challenges in clinical development PSDM

Visual Analytics in Risk-Based Monitoring (RBM) of Clinical Trials www.i-review.com Jreview Intro

RBM Contractor Management Work Cycle Contractor The strength of Rio Tintos contractor

New Modification of Restricted Boltzmann Machine that Considers the Stochasticity of Real Neural

Roadmap StandardizaUon and Spectrum AllocaUon 2015 2016 2017

Fate of HIV-1 infected cells under suppressive ART Fabian Otte Molecular Virology DBM Basel

PAGGUGOL NA MATUWID: DAAN SA KASAGANAAN DAAN SA KASAGANAAN Secretary Florencio B. Abad, DBM

Partial stochastic characterization of timed runs over DBM domains Laura Carnevali Lorenzo Ridi

Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang

D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic generative models that are

State Innovation Waiver Policy Forum: Health Connector Non-Group & Subsidized Coverage AMANDA

8/23/2016 Grace After We're "Home" Stanza 5 Yes, when this flesh and heart shall fail

2017 SOLI DEO GLORIA John 12:27-33 Outline by Jordan Thomas INTRODUCTION The Latin phrase,

Discipleship Lesson #09 September 23, 2018 Dean Bible Ministries www.deanbibleministries.org Dr.

Securing Practical Quantum Cryptography with Optical Power Limiters Gong Zhang 1,* , Ignatius

UPPAAL tutorial Modeling Verification 1 2 Architecture of UPPAAL Whats inside UPPAAL

Wireless Communication Systems @CS.NCTU Lecture 11: Successive Interference Cancellation

User Guide KI 734x Series Two way Loss & ORL Tester y www kingfisher com au

RBM, DBN, and DBM M. Soleymani Sharif University of Technology - PowerPoint PPT Presentation

RBM, DBN, and DBM M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Salakhutdinov lectures, CMU 2017 and Hugo Larochelle s class on Neural Networks: https://sites.google.com/site/deeplearningsummerschool2016/.

HISTORY OF DBM HISTORY 1936-64- THE BEGINNING MAURITIUS AGRICULTURAL BANK (MAB) Providing

Age nda Que s t i ons on pha s e 2 of t he pr oj e c t Today: DBM S i

Practjcal, Powerful Presentatjon Skills JHB, CPT and DBN R 6, 675 Ex VAT 2 Days E - Learning

Graphical Models Kalman Filter DBN ML 701 Undirected Models Anna Goldenberg

Dynamic Bayesian network (DBN) HMM defined by Transition model P(X (t+1) |X (t) )

2 October 2019 WHO Malaria Policy and Advisory Committee Presentation by Dr Abdourahmane Diallo

Risk-Based Monitoring (RBM): Opportunities and challenges in clinical development PSDM

Visual Analytics in Risk-Based Monitoring (RBM) of Clinical Trials www.i-review.com Jreview Intro

RBM Contractor Management Work Cycle Contractor The strength of Rio Tintos contractor

New Modification of Restricted Boltzmann Machine that Considers the Stochasticity of Real Neural

Roadmap StandardizaUon and Spectrum AllocaUon 2015 2016 2017

Fate of HIV-1 infected cells under suppressive ART Fabian Otte Molecular Virology DBM Basel

PAGGUGOL NA MATUWID: DAAN SA KASAGANAAN DAAN SA KASAGANAAN Secretary Florencio B. Abad, DBM

Partial stochastic characterization of timed runs over DBM domains Laura Carnevali Lorenzo Ridi

Lecture 8: Autoencoder &amp; DBM Princeton University COS 495 Instructor: Yingyu Liang

D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic generative models that are

State Innovation Waiver Policy Forum: Health Connector Non-Group &amp; Subsidized Coverage AMANDA

8/23/2016 Grace After We're &quot;Home&quot; Stanza 5 Yes, when this flesh and heart shall fail

2017 SOLI DEO GLORIA John 12:27-33 Outline by Jordan Thomas INTRODUCTION The Latin phrase,

Discipleship Lesson #09 September 23, 2018 Dean Bible Ministries www.deanbibleministries.org Dr.

Securing Practical Quantum Cryptography with Optical Power Limiters Gong Zhang 1,* , Ignatius

UPPAAL tutorial Modeling Verification 1 2 Architecture of UPPAAL Whats inside UPPAAL

Wireless Communication Systems @CS.NCTU Lecture 11: Successive Interference Cancellation

User Guide KI 734x Series Two way Loss &amp; ORL Tester y www kingfisher com au

Lecture 8: Autoencoder & DBM Princeton University COS 495 Instructor: Yingyu Liang

State Innovation Waiver Policy Forum: Health Connector Non-Group & Subsidized Coverage AMANDA

8/23/2016 Grace After We're "Home" Stanza 5 Yes, when this flesh and heart shall fail

User Guide KI 734x Series Two way Loss & ORL Tester y www kingfisher com au