Deep Gaussian Processes (IPVI DGP) Haibin Yu, Yizhou Chen - PowerPoint PPT Presentation

Implicit Posterior Variational Inference for Deep Gaussian Processes (IPVI DGP) Haibin Yu*, Yizhou Chen* Zhongxiang Dai Bryan Kian Hsiang Low and Patrick Jaillet Department of Computer Science National University of Singapore Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology * indicates equal contribution Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Gaussian Processes (GP) vs. Deep Gaussian Processes (DGP) A GP is fully specified by its kernel function RBF: universal approximator Matern Brownian Linear Polynomial …… Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Gaussian Processes (GP) vs. Deep Gaussian Processes (DGP) f ( x ) g ( x ) ( f � g )( x ) Composition of GPs significantly boosts the expressive power Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Existing DGP models • Approximation methods based on inducing variables • Variational Inference • Damianou and Lawrence, AISTATS, 2013 • Hensman and Lawrence, arXiv, 2014 • Salimbeni and Deisenroth, NeurIPS, 2017 • Expectation Propagation • Bui, ICML, 2016 • MCMC • Havasi et al, NeurIPS 2018 • Random feature approximation methods • Cutajar et al, ICML 2017 Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Deep Gaussian Processes (DGP) X Input X Output y Inducing variables U = { U 1 , . . . , U L } U 1 F 1 Posterior is intractable! p ( U | y ) U 2 F 2 U 3 F 3 y Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

DGP Inference Exact inference is intractable in DGP Variational Inference Sampling T E p ( θ | X ) [ f ( θ )] ≈ 1 q ∗ = min q ∈ Q KL[ q ( θ ) || p ( θ | X )] X f ( θ t ) : θ t ∼ p ( θ | X ) T t =1 p ( θ | X ) q ∗ Variational Family Q p ( θ | X ) All probability distributions 1. biased 1. unbiased 2. local minima 2. local modes 3. simplicity 3. efficiency Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

DGP Inference: Variational Inference Exact inference is intractable in DGP Variational Inference Sampling T E p ( θ | X ) [ f ( θ )] ≈ 1 q ∗ = min q ∈ Q KL[ q ( θ ) || p ( θ | X )] X f ( θ t ) : θ t ∼ p ( θ | X ) T t =1 p ( θ | X ) q ∗ Variational Family Q p ( θ | X ) All probability distributions 1. biased 1. unbiased 2. local minima 2. local modes Variational Inference 3. simplicity 3. efficiency Gaussian approximation Mean field approximation Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

DGP Inference: Variational Inference Variational Inference Sampling T E p ( θ | X ) [ f ( θ )] ≈ 1 q ∗ = min q ∈ Q KL[ q ( θ ) || p ( θ | X )] X f ( θ t ) : θ t ∼ p ( θ | X ) T t =1 p ( θ | X ) q ∗ Variational Family Q p ( θ | X ) All probability distributions efficient biased Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

DGP Inference: Sampling Variational Inference Sampling T E p ( θ | X ) [ f ( θ )] ≈ 1 q ∗ = min q ∈ Q KL[ q ( θ ) || p ( θ | X )] X f ( θ t ) : θ t ∼ p ( θ | X ) T t =1 p ( θ | X ) q ∗ Variational Family Q p ( θ | X ) All probability distributions efficient biased Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

DGP Inference: Sampling Variational Inference Sampling T E p ( θ | X ) [ f ( θ )] ≈ 1 q ∗ = min q ∈ Q KL[ q ( θ ) || p ( θ | X )] X f ( θ t ) : θ t ∼ p ( θ | X ) T t =1 p ( θ | X ) q ∗ Variational Family Q p ( θ | X ) All probability distributions efficient ideally unbiased biased not efficient Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

DGP: Variational Inference vs. Sampling Variational Inference Sampling T E p ( θ | X ) [ f ( θ )] ≈ 1 q ∗ = min q ∈ Q KL[ q ( θ ) || p ( θ | X )] X f ( θ t ) : θ t ∼ p ( θ | X ) T t =1 p ( θ | X ) q ∗ ∗ Variational Family Q p ( θ | X ) All probability distributions efficiency ideally unbiased Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

DGP: Variational Inference vs. Sampling Variational Inference Sampling T E p ( θ | X ) [ f ( θ )] ≈ 1 q ∗ = min q ∈ Q KL[ q ( θ ) || p ( θ | X )] X f ( θ t ) : θ t ∼ p ( θ | X ) T t =1 p ( θ | X ) q ∗ ∗ Variational Family Q p ( θ | X ) All probability distributions efficiency ideally unbiased unbiased posterior & efficiency Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Implicit Posterior Variational Inference random generator noise g Φ ( · ) samples of q Φ ( U ) ELBO = E q ( F L ) [log p ( y | F L )] − KL [ q Φ ( U ) || p ( U )] Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Implicit Posterior Variational Inference random generator noise g Φ ( · ) samples of q Φ ( U ) ELBO = E q ( F L ) [log p ( y | F L )] − KL [ q Φ ( U ) || p ( U )]  � log q Φ ( U ) KL[ q Φ ( U ) k p ( U )] = E q Φ ( U ) p ( U ) Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Implicit Posterior Variational Inference generator log q Φ ( U ) q Φ ( U ) discriminator p ( U ) T ( U ) p ( U ) Proposition 1. The optimal discriminator exactly recovers the log-density ratio Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Implicit Posterior Variational Inference Two-player game Player [1]: max { Ψ } E p ( U ) [log(1 − σ ( T Ψ ( U ))] + E q Φ ( U ) [log σ ( T Ψ ( U ))] , discriminator − { } Player [2]: max { θ , Φ } E q Φ ( U ) [ L ( θ , X , y , U ) − T Ψ ( U )] & DGP generator hyperparameters Best-response dynamics (BRD) to search for a Nash equilibrium Proposition 2. Nash equilibrium recovers the true posterior p ( U | y ) Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Architecture of the generator and discriminator Naive design for layer ` • Fail to adequately capture the dependency of the inducing output variables on the U = { U 1 , . . . , U L } corresponding inducing inputs Z = { Z 1 , . . . , Z L } • Relatively large number of parameters, resulting in overfitting, optimization di ffi culty, etc. generator (naive) Implicit Posterior Variational Inference for Deep Gaussian Process, NeurIPS 2019

Architecture of Generator and Discriminator for DGP Our parameter-tying design for layer ` • Concatenates the inducing inputs Z ` • Posterior samples are generated based on single shared parameter setting φ ` generator Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Experimental Results Metric for evaluation MLL (mean log likelihood) Algorithms for comparison DSVI DGP : Doubly stochastic variational inference DGP [Salimbeni and Deisenroth, 2017] SGHMC DGP : Stochastic gradient Hamilton Monte Carlo DGP [Havasi et al, 2018] Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Experimental Results Synthetic Experiment: Learning a Multi-Modal Posterior Belief

Experimental Results MLL on UCI Benchmark Regression & Real World Regression Our IPVI DGP SGHMC DGP DSVI DGP Our IPVI DGP generally performs the best. Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Experimental Results Mean test accuracy (%) for 3 classification datasets Dataset MNIST Fashion-MNIST CIFAR-10 SGP DGP 4 SGP DGP 4 SGP DGP 4 DSVI 97.41 86.98 87.99 47.15 51.79 97.32 SGHMC 96.41 97.55 85.84 87.08 47.32 52.81 97.02 IPVI 97.80 87.29 88.90 48.07 53.27 Our IPVI DGP generally performs the best. Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Experimental Results Time E ffi ciency Time incurred by sampling from a IPVI SGHMC Average training time (per iter.) 0 . 35 sec. 3 . 18 sec. 4-layer DGP model for Airline U generation (100 samples) 0 . 28 sec. 143 . 7 sec. dataset. MLL vs. total incurred time to train a 4-layer DGP model for the Airline dataset. IPVI is much faster than SGHMC in terms of training as well as sampling. Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Conclusion A novel IPVI DGP framework Can ideally recover an unbiased posterior belief. Preserve time e ffi ciency. Cast the DGP inference into a two-player game Search for Nash equilibrium using BRD Parameter-tying architecture Alleviate overfitting Speed up training and prediction More details of our paper Detailed architecture of generator and discriminator. Detailed analysis of our BRD algorithm. More experimental results. Implicit Posterior Variational Inference for Deep Gaussian Processes, NeurIPS 2019

Deep Gaussian Processes (IPVI DGP) Haibin Yu, Yizhou Chen - PowerPoint PPT Presentation

Implicit Posterior Variational Inference for Deep Gaussian Processes (IPVI DGP) Haibin Yu, Yizhou Chen Zhongxiang Dai Bryan Kian Hsiang Low and Patrick Jaillet Department of Computer Science National University of Singapore Department of

RECURSIVE DEEP MODELS FOR SEMANTIC 1 COMPOSITIONALITY Zhicong Lu DGP Lab

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Determining the PSF over the Full FoV of LSST using Anisotropic Gaussian Processes

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V.

A Probabilistic Model for Using Social Networks in Personalized Item Recommendation Allison J.B.

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

The Discovery of Asymptotic Freedom & The Emergence of QCD David Gross Nobel Lecture

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay Viswanathan, Ganesh

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Intractable Problems and DP with Bitmask Problem Solving Club March 1, 2017 Agenda

On the Algorithmic Effectiveness of Digraph Decompositions and Complexity Measures Michael

Sambuz

Useful Links

Newsletter

Mail Us

Deep Gaussian Processes (IPVI DGP) Haibin Yu*, Yizhou Chen* - PowerPoint PPT Presentation

Implicit Posterior Variational Inference for Deep Gaussian Processes (IPVI DGP) Haibin Yu*, Yizhou Chen* Zhongxiang Dai Bryan Kian Hsiang Low and Patrick Jaillet Department of Computer Science National University of Singapore Department of

RECURSIVE DEEP MODELS FOR SEMANTIC 1 COMPOSITIONALITY Zhicong Lu DGP Lab

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

Determining the PSF over the Full FoV of LSST using Anisotropic Gaussian Processes

Scalable Gaussian Processes Zhenwen Dai Amazon September 4, 2018 @GPSS2018 Zhenwen Dai (Amazon)

Scalable Gaussian Processes Zhenwen Dai Amazon 9 September 2019 @GPSS 2019 Zhenwen Dai (Amazon)

Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference Louis C. Tiao 1 Edwin V.

A Probabilistic Model for Using Social Networks in Personalized Item Recommendation Allison J.B.

Variational Autoencoders Tom Fletcher March 25, 2019 Talking about this paper: Diederik Kingma

The Discovery of Asymptotic Freedom &amp; The Emergence of QCD David Gross Nobel Lecture

Clarinet: WAN-Aware Optimization for Analyt ytics Queries Raajay Viswanathan, Ganesh

Another Walkthrough of Variational Bayes Bevan Jones ML for NLP Reading Group The University of

Intractable Problems and DP with Bitmask Problem Solving Club March 1, 2017 Agenda

On the Algorithmic Effectiveness of Digraph Decompositions and Complexity Measures Michael

Sambuz

Useful Links

Newsletter

Mail Us

Deep Gaussian Processes (IPVI DGP) Haibin Yu, Yizhou Chen - PowerPoint PPT Presentation

Implicit Posterior Variational Inference for Deep Gaussian Processes (IPVI DGP) Haibin Yu, Yizhou Chen Zhongxiang Dai Bryan Kian Hsiang Low and Patrick Jaillet Department of Computer Science National University of Singapore Department of

The Discovery of Asymptotic Freedom & The Emergence of QCD David Gross Nobel Lecture