recurrent gaussian processes andreas damianou
play

Recurrent Gaussian processes Andreas Damianou with C esar Lincoln - PowerPoint PPT Presentation

Robust, Deep Recurrent Gaussian processes Andreas Damianou with C esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016 Challenge: Learn patterns from sequences Recurrent Gaussian


  1. Robust, Deep Recurrent Gaussian processes Andreas Damianou with C´ esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016

  2. Challenge: Learn patterns from sequences � Recurrent Gaussian Processes (RGP): a family of recurrent Bayesian nonparametric models (data efficient, uncertainty handling) . � Latent deep RGP: a deep RGP with latent states (simultaneous representation + dynamical learning) . � Recurrent Variational Bayes (REVARB) framework (efficient inference + coherent propagation of uncertainty) � Extension: RNN-based sequential recognition models (Regularization + parameter reduction) . � Extension: Robustness to outliers. � Comparison with LSTMs, parametric and non-latent models.

  3. NARX model A standard NARX model considers an input vector x i ∈ R D comprised of L y past observed outputs y i ∈ R and L u past exogenous inputs u i ∈ R : x i = [ y i − 1 , · · · , y i − L y , u i − 1 , · · · , u i − L u ] ⊤ , y i = f ( x i ) + ǫ ( y ) ǫ ( y ) ∼ N ( ǫ ( y ) i | 0 , σ 2 y ) , i , i State-space model: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) (transition) , i y i = x i + ǫ ( y ) (emission) i Non-linear emission: y i = g ( x i ) + ǫ ( y ) i

  4. NARX model A standard NARX model considers an input vector x i ∈ R D comprised of L y past observed outputs y i ∈ R and L u past exogenous inputs u i ∈ R : x i = [ y i − 1 , · · · , y i − L y , u i − 1 , · · · , u i − L u ] ⊤ , y i = f ( x i ) + ǫ ( y ) ǫ ( y ) ∼ N ( ǫ ( y ) i | 0 , σ 2 y ) , i , i State-space model: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) (transition) , i y i = x i + ǫ ( y ) (emission) i Non-linear emission: y i = g ( x i ) + ǫ ( y ) i

  5. NARX vs State-space ◮ Latent inputs allow for simultaneous representation learning and dynamical learning. ◮ Latent inputs means that noisy predictions are not fed back to the model.

  6. (Deep) RGP Start from a deep GP: · · · y x (1) x (2) x ( H ) u

  7. (Deep) RGP Latent states formed from lagged window of length L : · · · y u x (1) x (2) x ( H ) ¯ ¯ ¯ For one layer: x i = [ x i , · · · , x i − L +1 ] ⊤ , x j ∈ R ¯

  8. (Deep) RGP Add recursion in the latent states: · · · x (1) x (2) x ( H ) y u ¯ ¯ ¯ For one layer: x i = [ x i , · · · , x i − L +1 ] ⊤ , x j ∈ R ¯ So that: u i − 1 ) + ǫ x x i = f ( ¯ x i − 1 , ¯ i x i ) + ǫ y y i = g ( ¯ i

  9. REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.

  10. REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.

  11. REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.

  12. REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.

  13. RNN-based recognition model Reduce variational parameters by reparameterizing the variational means µ ( h ) using RNNs: i = g ( h ) � � µ ( h ) x ( h ) ˆ , i i − 1 where g ( x ) = V ⊤ L N φ L N ( W L N − 1 φ L N − 1 ( · · · W 2 φ 1 ( U 1 x ))) , Amortized inference also regularizes the optimization procedure.

  14. RNN-based recognition model Reduce variational parameters by reparameterizing the variational means µ ( h ) using RNNs: i = g ( h ) � � µ ( h ) x ( h ) ˆ , i i − 1 where g ( x ) = V ⊤ L N φ L N ( W L N − 1 φ L N − 1 ( · · · W 2 φ 1 ( U 1 x ))) , Amortized inference also regularizes the optimization procedure.

  15. Robustness to outliers Recall the RGP variant with parametric emission: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) , i y i = x i + ǫ ( y ) i , ǫ ( x ) ∼ N ( ǫ ( x ) | 0 , σ 2 x ) , i i ǫ ( y ) i ∼ N ( ǫ ( y ) i | 0 , τ − 1 τ i ∼ Γ( τ i | α, β ) , ) , i ◮ “Switching-off” outliers by including the above Student-t likelihood. ◮ Modified REVARB allows for analytic solution.

  16. Robustness to outliers Recall the RGP variant with parametric emission: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) , i y i = x i + ǫ ( y ) i , ǫ ( x ) ∼ N ( ǫ ( x ) | 0 , σ 2 x ) , i i ǫ ( y ) i ∼ N ( ǫ ( y ) i | 0 , τ − 1 τ i ∼ Γ( τ i | α, β ) , ) , i ◮ “Switching-off” outliers by including the above Student-t likelihood. ◮ Modified REVARB allows for analytic solution.

  17. Robust GP autoregressive model: demonstration (a) Artificial 1. (b) Artificial 2. Figure: RMSE values for free simulation on test data with different levels of contamination by outliers.

  18. (c) Artificial 3. (d) Artificial 4. (e) Artificial 5.

  19. Results Results in nonlinear systems identification: 1. artificial dataset 2. “drive” dataset: by a system with two electric motors that drive a pulley using a flexible belt. ◮ input: the sum of voltages applied to the motors ◮ output: speed of the belt.

  20. RGP GPNARX MLP-NARX LSTM

  21. RGP GPNARX MLP-NARX LSTM

  22. Avatar control Figure: The generated motion with a step function signal, starting with walking (blue), switching to running (red) and switching back to walking (blue). Videos: https://youtu.be/FR-oeGxV6yY Switching between learned speeds https://youtu.be/AT0HMtoPgjc Interpolating (un)seen speed https://youtu.be/FuF-uZ83VMw Constant unseen speed

Recommend


More recommend