The bridge between deep learning and probabilistic machine learning - PowerPoint PPT Presentation

The bridge between deep learning and probabilistic machine learning Petru Rebeja 2020-07-15

About me • PhD student at Al. I. Cuza, Faculty of Computer Science • Passionate about AI • Iaşi AI member • Technical Lead at Centric IT Solutions Romania 1

Why the strange title? • Based on my own experience • Variational Autoencoders do bridge the two domains • To have a full picture we must look from both perspectives 2

Introduction: Autoencoders • Neural network composed from two parts: • An encoder and • A decoder 3

Autoencoders How it works: 1. The encoder accepts as input X ∈ R D 2. It encodes it into z ∈ R K where K ≪ D by learning a function g : R D → R K 3. The decoder receives z and reconstructs the original X from it by learning a function f : R K → R D s.t. f ( g ( X )) ≈ X 4

Autoencoders — Architecture How it looks: 5

Autoencoders — example usage Autoencoders can be used in anomaly detection 1, 2 : • For points akin to those in the training set (i.e. normal) the encoder will produce an efficient encoding and the decoder will be able to decode it, • For outliers, there will be an efficient encoding but the decoder will fail to reconstruct the input. 1 Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction 2 Anomaly Detection with Robust Deep Autoencoders 6

Variational Autoencoders — birds’ eye view From high-level perspective, Variational Autoencoders ( VAE ) have the same structure as an autoencoder : • An encoder which determines the latent representation ( z ) from the input X , and • A decoder which reconstructs the input X from z . 7

VAE architecture — high-level 8

Variational Autoencoders — zoom in • Unlike autoencoders , a VAE does not transform input into an encoding and back. • Rather, it assumes that the data is generated from a distribution governed by latent variables and tries to infer the parameters of that distribution in order to generate similar data. 9

Latent variables • Represent fundamental traits of each datapoint fed to the model, • Are inferred by the model ( VAE ) in order to, • Drive the decision of what exactly to generate . Example (Handwritten digits) To draw handwritten digits a model will decide upon the digit being drawn, the stroke, thickness etc. 10

What exactly are Variational Autoencoders? • Not only generative models 3 , • A way to both postulate and infer complex data-generative processes 3 3 Variational auto-encoders do not train complex generative models 11

VAE from deep learning perspective Like an autoencoder with: • More complex architecture • Two input nodes one of which takes in random numbers. • A complicated loss function 12

VAE architecture 13

VAE architecture • The encoder infers the parameters ( µ, σ ) of the distribution that generates X 13

VAE architecture • The encoder infers the parameters ( µ, σ ) of the distribution that generates X • The decoder learns two functions: 13

VAE architecture • The encoder infers the parameters ( µ, σ ) of the distribution that generates X • The decoder learns two functions: • A function that maps a random point drawn from a normal distribution to a point in the space of latent representations, 13

VAE architecture • The encoder infers the parameters ( µ, σ ) of the distribution that generates X • The decoder learns two functions: • A function that maps a random point drawn from a normal distribution to a point in the space of latent representations, • A function that reconstructs the input from its latent representation. 13

Loss function For each data point the following beast loss function is calculated: l i ( θ, φ ) = − E z ∼ q θ ( z | x i ) [ log p φ ( x i | z )] + KL ( q θ ( z | x i ) || p ( z )) Where: • E z ∼ q θ ( z | x i ) [ log p φ ( x i | z )] is the reconstruction loss, and • KL ( q θ ( z | x i ) || p ( z )) measures how close are two probability distributions. 14

Loss function — a quirk • The loss function is the negative of Evidence Lower Bound ELBO . • Minimizing the loss means maximizing the ELBO which leads to awkward constructs like optimizer.optimize(-elbo) 4 4 What is a variational autoencoder? 15

A probabilistic generative model • Each data point comes from a probability distribution p ( x ) • p ( x ) is governed by a distribution of latent variables p ( z ) • To generate a new point the model: • Performs a draw from latent variables z i ∼ p ( z ) • Draws the new data point x i ∼ p ( x | z ) • Our goal is to compute p ( z | x ) which is intractable . 16

VAE as a probabilistic encoder/decoder • The inference network encodes x into p ( z | x ) • The generative model decodes x from p ( x | z ) by: • drawing a point from a normal distribution • mapping it through a function to p ( x | z ) 17

Inference network • Approximates the parameters ( µ i , σ i ) of the distributions that generate each data point x i • Determines a distribution q φ ( z | x ) which is closest to p ( z | x ) 18

Maximizing ELBO • Inference network uses KL divervence to approximate the posterior • KL divervence depends on the marginal and is intractable • Instead, we maximize ELBO which: • minimizes KL divervence, and • is tractable. 19

Instead of a demo • Unfortunately, the experiment I’m working on is not ready for the stage • It is still stuck in data preparation stage (removing garbage) • Instead you can have a look at an elegant implementation provided by Louis Tiao. 20

Questions? 21

More info • Kingma, D. P. and Welling M., (2014) Auto-Encoding Variational Bayes • Rezende, D. J., Mohamed, S., & Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models • Doersch, C., (2016) Tutorial on variational autoencoders • Altosaar J., What is a variational autoencoder? • Tiao L., Implementing Variational Autoencoders in Keras: Beyond the Quickstart Tutorial 22

Thank you! Please provide feedback! 23

The bridge between deep learning and probabilistic machine learning - PowerPoint PPT Presentation

The bridge between deep learning and probabilistic machine learning Petru Rebeja 2020-07-15 About me PhD student at Al. I. Cuza, Faculty of Computer Science Passionate about AI Iai AI member Technical Lead at Centric IT

Probabilistic model Probabilistic model c Probabilistic model Probabilistic model c c

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Bridge Design Introduction 13.02.2020 ETH Zurich | Chair of Concrete Structures and Bridge

Indi diana na Bridge L Load R Rating Jeremy Hunter INDOT Bridge Design Manager Indiana

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

CS 4110 Probabilistic Programming Probabilistic Programming It's not about writing software.

Bridge Rehabilitation/Replacement Needs Redbud Trail Bridge Barton Springs Road Bridge Presentation

5 th Street SE Bridge Overview Bridge is functionally obsolete New bridge funded with

Bridge Beam E In Posi5on Bridge Beam A At TCO 1 APA#1 is loaded on Bridge Beam A At TCO 2

Edward: Deep Probabilistic Programming Extended Seminar Systems and Machine Learning Steven

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Impact of Intellectual Property Rights on Activity of Cross-Border Mergers and Acquisitions

The Evolution of Waivers in Ontario: The Schnarr and W Woodhouse Appeals dh A l Garett

Annual General Meeting July 15, 2020 Virtual Board of f Directors Welcome Explorers Edge

Court I nnovations Grants Grant Applicant Teleconference September 20, 2016 1-877-820-7831

The WSNLA Community Represents 850 businesses and individuals working in or serving the

The Value and Impact of Long-term Monitoring Karina J. Nielsen, PhD San Francisco State

Operating Performance For the period of 6 months ended 30 June 2019 -1- CG Score: Part 1

Dairy intake-related attitudes, subjective norms and perceived behavioural control of South