Robust, Deep Recurrent Gaussian processes Andreas Damianou with C´ esar Lincoln Mattos, Zhenwen Dai, Neil Lawrence, Jeremy Forth, Guilherme Barreto Royal Society , 06 June 2016
Challenge: Learn patterns from sequences � Recurrent Gaussian Processes (RGP): a family of recurrent Bayesian nonparametric models (data efficient, uncertainty handling) . � Latent deep RGP: a deep RGP with latent states (simultaneous representation + dynamical learning) . � Recurrent Variational Bayes (REVARB) framework (efficient inference + coherent propagation of uncertainty) � Extension: RNN-based sequential recognition models (Regularization + parameter reduction) . � Extension: Robustness to outliers. � Comparison with LSTMs, parametric and non-latent models.
NARX model A standard NARX model considers an input vector x i ∈ R D comprised of L y past observed outputs y i ∈ R and L u past exogenous inputs u i ∈ R : x i = [ y i − 1 , · · · , y i − L y , u i − 1 , · · · , u i − L u ] ⊤ , y i = f ( x i ) + ǫ ( y ) ǫ ( y ) ∼ N ( ǫ ( y ) i | 0 , σ 2 y ) , i , i State-space model: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) (transition) , i y i = x i + ǫ ( y ) (emission) i Non-linear emission: y i = g ( x i ) + ǫ ( y ) i
NARX model A standard NARX model considers an input vector x i ∈ R D comprised of L y past observed outputs y i ∈ R and L u past exogenous inputs u i ∈ R : x i = [ y i − 1 , · · · , y i − L y , u i − 1 , · · · , u i − L u ] ⊤ , y i = f ( x i ) + ǫ ( y ) ǫ ( y ) ∼ N ( ǫ ( y ) i | 0 , σ 2 y ) , i , i State-space model: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) (transition) , i y i = x i + ǫ ( y ) (emission) i Non-linear emission: y i = g ( x i ) + ǫ ( y ) i
NARX vs State-space ◮ Latent inputs allow for simultaneous representation learning and dynamical learning. ◮ Latent inputs means that noisy predictions are not fed back to the model.
(Deep) RGP Start from a deep GP: · · · y x (1) x (2) x ( H ) u
(Deep) RGP Latent states formed from lagged window of length L : · · · y u x (1) x (2) x ( H ) ¯ ¯ ¯ For one layer: x i = [ x i , · · · , x i − L +1 ] ⊤ , x j ∈ R ¯
(Deep) RGP Add recursion in the latent states: · · · x (1) x (2) x ( H ) y u ¯ ¯ ¯ For one layer: x i = [ x i , · · · , x i − L +1 ] ⊤ , x j ∈ R ¯ So that: u i − 1 ) + ǫ x x i = f ( ¯ x i − 1 , ¯ i x i ) + ǫ y y i = g ( ¯ i
REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.
REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.
REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.
REVARB: REcurrent VARiational Bayes Extend joint probability with inducing points : � � x ( h ) , f ( h ) , z ( h ) �� � y , f ( H +1) , z ( H +1) , � H p ( joint ) = p � h =1 p ( joint ) � Lower bound: log p ( y ) ≤ f , x , z Q Q = q ( x ( h ) , f ( h ) , z ( h ) ) , ∀ h � � � x ( h ) � µ ( h ) , λ ( h ) x ( h ) � = � N � Posterior marginal: q i =1 N � i i i Mean-field for q ( x ) allows analytical solution without having to resort to sampling. Additional layers to compensate for the uncorrelated posterior.
RNN-based recognition model Reduce variational parameters by reparameterizing the variational means µ ( h ) using RNNs: i = g ( h ) � � µ ( h ) x ( h ) ˆ , i i − 1 where g ( x ) = V ⊤ L N φ L N ( W L N − 1 φ L N − 1 ( · · · W 2 φ 1 ( U 1 x ))) , Amortized inference also regularizes the optimization procedure.
RNN-based recognition model Reduce variational parameters by reparameterizing the variational means µ ( h ) using RNNs: i = g ( h ) � � µ ( h ) x ( h ) ˆ , i i − 1 where g ( x ) = V ⊤ L N φ L N ( W L N − 1 φ L N − 1 ( · · · W 2 φ 1 ( U 1 x ))) , Amortized inference also regularizes the optimization procedure.
Robustness to outliers Recall the RGP variant with parametric emission: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) , i y i = x i + ǫ ( y ) i , ǫ ( x ) ∼ N ( ǫ ( x ) | 0 , σ 2 x ) , i i ǫ ( y ) i ∼ N ( ǫ ( y ) i | 0 , τ − 1 τ i ∼ Γ( τ i | α, β ) , ) , i ◮ “Switching-off” outliers by including the above Student-t likelihood. ◮ Modified REVARB allows for analytic solution.
Robustness to outliers Recall the RGP variant with parametric emission: x i = f ( x i − 1 , · · · , x i − L x u i − 1 , · · · , u i − L u ) + ǫ ( x ) , i y i = x i + ǫ ( y ) i , ǫ ( x ) ∼ N ( ǫ ( x ) | 0 , σ 2 x ) , i i ǫ ( y ) i ∼ N ( ǫ ( y ) i | 0 , τ − 1 τ i ∼ Γ( τ i | α, β ) , ) , i ◮ “Switching-off” outliers by including the above Student-t likelihood. ◮ Modified REVARB allows for analytic solution.
Robust GP autoregressive model: demonstration (a) Artificial 1. (b) Artificial 2. Figure: RMSE values for free simulation on test data with different levels of contamination by outliers.
(c) Artificial 3. (d) Artificial 4. (e) Artificial 5.
Results Results in nonlinear systems identification: 1. artificial dataset 2. “drive” dataset: by a system with two electric motors that drive a pulley using a flexible belt. ◮ input: the sum of voltages applied to the motors ◮ output: speed of the belt.
RGP GPNARX MLP-NARX LSTM
RGP GPNARX MLP-NARX LSTM
Avatar control Figure: The generated motion with a step function signal, starting with walking (blue), switching to running (red) and switching back to walking (blue). Videos: https://youtu.be/FR-oeGxV6yY Switching between learned speeds https://youtu.be/AT0HMtoPgjc Interpolating (un)seen speed https://youtu.be/FuF-uZ83VMw Constant unseen speed
Recommend
More recommend