Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - PowerPoint PPT Presentation

Robust, Deep Recurrent Gaussian processes Andreas Damianou with C´ esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016

Challenge: Learn patterns from sequences � Recurrent Gaussian Processes (RGP): a family of recurrent Bayesian nonparametric models (data efficient, uncertainty handling) . � Latent deep RGP: a deep RGP with latent states (simultaneous representation + dynamical learning) . � Recurrent Variational Bayes (REVARB) framework (efficient inference + coherent propagation of uncertainty) � Extension: RNN-based sequential recognition models (Regularization + parameter reduction) . � Extension: Robustness to outliers. � Comparison with LSTMs, parametric and non-latent models.

NARX model A standard NARX model considers an input vector x i ∈ R D comprised of L y past observed outputs y i ∈ R and L u past exogenous inputs u i ∈ R : x i = [ y i − 1 , · · · , y i − L y , u i − 1 , · · · , u i − L u ] ⊤ , y i = f ( x i ) + ǫ ( y ) ǫ ( y ) ∼ N ( ǫ ( y ) i | 0 , σ 2 y ) , i , i State-space model: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) (transition) , i y i = x i + ǫ ( y ) (emission) i Non-linear emission: y i = g ( x i ) + ǫ ( y ) i

NARX vs State-space ◮ Latent inputs allow for simultaneous representation learning and dynamical learning. ◮ Latent inputs means that noisy predictions are not fed back to the model.

(Deep) RGP Start from a deep GP: · · · y x (1) x (2) x ( H ) u

(Deep) RGP Latent states formed from lagged window of length L : · · · y u x (1) x (2) x ( H ) ¯ ¯ ¯ For one layer: x i = [ x i , · · · , x i − L +1 ] ⊤ , x j ∈ R ¯

(Deep) RGP Add recursion in the latent states: · · · x (1) x (2) x ( H ) y u ¯ ¯ ¯ For one layer: x i = [ x i , · · · , x i − L +1 ] ⊤ , x j ∈ R ¯ So that: u i − 1 ) + ǫ x x i = f ( ¯ x i − 1 , ¯ i x i ) + ǫ y y i = g ( ¯ i

REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.

RNN-based recognition model Reduce variational parameters by reparameterizing the variational means µ ( h ) using RNNs: i = g ( h ) � � µ ( h ) x ( h ) ˆ , i i − 1 where g ( x ) = V ⊤ L N φ L N ( W L N − 1 φ L N − 1 ( · · · W 2 φ 1 ( U 1 x ))) , Amortized inference also regularizes the optimization procedure.

Robustness to outliers Recall the RGP variant with parametric emission: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) , i y i = x i + ǫ ( y ) i , ǫ ( x ) ∼ N ( ǫ ( x ) | 0 , σ 2 x ) , i i ǫ ( y ) i ∼ N ( ǫ ( y ) i | 0 , τ − 1 τ i ∼ Γ( τ i | α, β ) , ) , i ◮ “Switching-off” outliers by including the above Student-t likelihood. ◮ Modified REVARB allows for analytic solution.

Robust GP autoregressive model: demonstration (a) Artificial 1. (b) Artificial 2. Figure: RMSE values for free simulation on test data with different levels of contamination by outliers.

(c) Artificial 3. (d) Artificial 4. (e) Artificial 5.

Results Results in nonlinear systems identification: 1. artificial dataset 2. “drive” dataset: by a system with two electric motors that drive a pulley using a flexible belt. ◮ input: the sum of voltages applied to the motors ◮ output: speed of the belt.

RGP GPNARX MLP-NARX LSTM

Avatar control Figure: The generated motion with a step function signal, starting with walking (blue), switching to running (red) and switching back to walking (blue). Videos: https://youtu.be/FR-oeGxV6yY Switching between learned speeds https://youtu.be/AT0HMtoPgjc Interpolating (un)seen speed https://youtu.be/FuF-uZ83VMw Constant unseen speed

Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - PowerPoint PPT Presentation

Robust, Deep Recurrent Gaussian processes Andreas Damianou with C esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016 Challenge: Learn patterns from sequences Recurrent Gaussian

Preferential Bayesian Optimization Javier Gonz alez, Zhenwen Dai , Andreas Damianou, Neil D.

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

The future of international Tax planning and International Banking Christodoulos Damianou 2019

The latest update in Cyprus and future of international Tax planning Christodoulos Damianou The

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Lo Long-short term memory (L (LSTM) Jeong Min Lee CS3750, University of Pittsburgh Outline

MTH314: Discrete Mathematics for Engineers Lecture 5: Mathematical Principles Dr Ewa Infeld

A Positive BB-Like Stepsize and An Extension for Symmetric Linear Systems Yu-Hong Dai Academy of

Lecture 3.8: Power series solutions Matthew Macauley Department of Mathematical Sciences Clemson

Recursion 15-110 - Friday 2/21 Learning Objectives Define and recognize base cases and

Recursion q Specify it in a comment before method's header n Precondition q What is assumed to be

Recursion Suppose You Are Waiting in L ine HI YOU! A Very Long Line You want to

Java Recursion Slides provided by the University of Washington Computer Science & Engineering

Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - PowerPoint PPT Presentation

Robust, Deep Recurrent Gaussian processes Andreas Damianou with C esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016 Challenge: Learn patterns from sequences Recurrent Gaussian

Preferential Bayesian Optimization Javier Gonz alez, Zhenwen Dai , Andreas Damianou, Neil D.

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes Instructor: Arindam Banerjee

Gaussian Processes Dan Cervone NYU CDS November 10, 2015 Dan Cervone (NYU CDS) Gaussian

CMPUT 466 Introduction to Gaussian Processes Dan Lizotte The Plan Introduction to Gaussian

Non-Gaussian likelihoods for Gaussian Processes Alan Saul Outline Motivation Non-Gaussian

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Lecture 3 Capacity of Multiuser Gaussian Channels The Gaussian uplink: 6.1 The fading

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Another introduction to Gaussian Processes Richard Wilkinson School of Maths and Statistics

Gaussian Processes for Big Data James Hensman joint work with Nicol o Fusi, Neil D. Lawrence

Gaussian Processes Seung-Hoon Na Chonbuk National University Gaussian Process Regression

The future of international Tax planning and International Banking Christodoulos Damianou 2019

The latest update in Cyprus and future of international Tax planning Christodoulos Damianou The

Faster Gaussian Lattice Sampling using Information Leakage Gaussian Sampling Our Work Lazy

Lo Long-short term memory (L (LSTM) Jeong Min Lee CS3750, University of Pittsburgh Outline

MTH314: Discrete Mathematics for Engineers Lecture 5: Mathematical Principles Dr Ewa Infeld

A Positive BB-Like Stepsize and An Extension for Symmetric Linear Systems Yu-Hong Dai Academy of

Lecture 3.8: Power series solutions Matthew Macauley Department of Mathematical Sciences Clemson

Recursion 15-110 - Friday 2/21 Learning Objectives Define and recognize base cases and

Recursion q Specify it in a comment before method's header n Precondition q What is assumed to be

Recursion Suppose You Are Waiting in L ine HI YOU! A Very Long Line You want to

Java Recursion Slides provided by the University of Washington Computer Science &amp; Engineering

Java Recursion Slides provided by the University of Washington Computer Science & Engineering