Deep Gaussian Processes with Importance-Weighted Variational - PowerPoint PPT Presentation

Deep Gaussian Processes with Importance-Weighted Variational Inference Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth

Problem setting

Problem setting Bimodal density

Problem setting Changes with input

Problem setting Skewness

Problem setting Skewness • Bus arrival times

Problem setting Skewness • Bus arrival times • Confounding variables

A possible approach f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

A possible approach test samples training data f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

A possible approach test samples Neural network training data f φ N y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

A possible approach test samples Neural network training data f φ Latent variable N (per point) y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

A possible approach test samples Neural network training data f φ Latent variable N (per point) y n x n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Concatenation with inputs

A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1)

A possible approach f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

A possible approach Overfitting f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

A possible approach Overfitting Deterministic function f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation

A possible approach Overfitting Deterministic function f φ N x n y n w n y n = N ( f φ ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Unreliable extrapolation Small number of examples per input x n

Another possible approach ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ, k )

Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ, k )

Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Better extrapolation f ∼ GP ( µ, k )

Another possible approach Non-parametric prior ∞ f N y n x n w n y n = N ( f ([ x n , w n ]) , σ 2 ) w n ∼ N (0 , 1) Better extrapolation f ∼ GP ( µ, k ) Underfitting

Our model ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

Our model Extrapolating gracefully ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

Our model Extrapolating gracefully ∞ ∞ g f N x n y n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) Better data fit g ∼ GP ( µ 2 , k 2 )

Contributions

Contributions • New architecture - latent variables by concatenation, not addition

Contributions • New architecture - latent variables by concatenation, not addition • Importance-weighted variational inference, exploiting analytic results

Contributions • New architecture - latent variables by concatenation, not addition • Importance-weighted variational inference, exploiting analytic results • Provide an extensive empirical comparison with all 41 UCI regression datasets

A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) w n ∼ N (0 , 1) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) f ∼ GP ( µ 1 , k 1 ) g ∼ GP ( µ 2 , k 2 )

A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) Variational inference f ∼ GP ( µ 1 , k 1 ) (sparse GP posterior) g ∼ GP ( µ 2 , k 2 )

A few details ∞ ∞ g f N y n x n w n y n = N ( f ( g ([ x n , w n ])) , σ 2 ) Importance weighting w n ∼ N (0 , 1) (Gaussian proposal) Variational inference f ∼ GP ( µ 1 , k 1 ) (sparse GP posterior) g ∼ GP ( µ 2 , k 2 ) Our approach exploits analytic results, leading to a tighter bound

Results

Results • Latent variables in the DGP are highly beneficial

Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both .

Results • Latent variables in the DGP are highly beneficial • Sometimes depth is enough. Sometimes latent variables are enough. Some datasets need both . • Importance-weighted VI outperforms VI

Thanks for listening Poster #218 • New architecture • Importance-weighted • 41 datasets

Deep Gaussian Processes with Importance-Weighted Variational - PowerPoint PPT Presentation

Deep Gaussian Processes with Importance-Weighted Variational Inference Hugh Salimbeni Vincent Dutordoir, James Hensman, Marc P Deisenroth Problem setting Problem setting Bimodal density Problem setting Changes with input Problem setting

Weighted graphs Weighted graphs Weighted graphs Weighted graphs Graphs with numbers, called

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

Importance-Weighted Cross- Importance-Weighted Cross- Validation for Covariate Shift Validation

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Weighted graphs 2 Weighted graphs So far we have only considered weighted graphs with

Weighted graphs 3 Weighted graph Edges in weighted graph are assigned a weight: w(v 1 , v 2 ),

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

CS70: Jean Walrand: Lecture 36. Gaussian and CLT CS70: Jean Walrand: Lecture 36. Gaussian and

CS5412: BIMODAL MULTICAST ASTROLABE Lecture XIX Ken Birman Gossip 201 2 Recall from early

agRec : An Enhanced Tag EnT EnTagRec Recommendation System for Software Information Sites

BEAM Interlock + POWERING Interlock Machine Interlocks Machine Interlocks Give Beam Permit for

your kingdom come, your will be done on earth as it is in heaven. Not everyone who

Analyzing the Shuffling Side-Channel Countermeasure for Lattice-Based Signatures Peter Pessl

Numerical Integration for Local Positioning Niilo Sirola, Robert Pich e, Henri Pesonen

B L A C K H O L E S O N F I R E ASTRID LAMBERTS SHEA GARRISON-KIMMEL/PHIL HOPKINS/DREW

Bimodal bilattice logic Igor Sedlr Institute of Computer Science, Czech Academy of Sciences,