IJ Aalto University State Space Expectation Propagation Efficient Inference Schemes for Temporal Gaussian Processes William Wilkinson ∗ , Paul Chang ∗ , Michael Riis Andersen † , Arno Solin ∗ Aalto University ∗ , Technical University of Denmark † ICML 2020
Motivation • We’re interested in long temporal and spatio-temporal data with interesting non-conjugate GP models (e.g. classification, log-Gaussian Cox processes). • Idea: We should treat the temporal dimension in a fundamentally different manner to other dimensions. State Space Expectation Propagation Wilkinson et. al. 1/10
Approximate Inference in Temporal GPs There exists a dual kernel / SDE form for most popular Gaussian process (GP) models � � 0 , K θ ( t , t ′ ) f ( t ) ∼ GP f k = A θ, k f k − 1 + q k , q k ∼ N ( 0 , Q k ) , y k ∼ p ( y k | f ( t k )) y k = h ( f k , σ k ) , σ k ∼ N ( 0 , Σ k ) State Space Expectation Propagation Wilkinson et. al. 2/10
Approximate Inference in Temporal GPs There exists a dual kernel / SDE form for most popular Gaussian process (GP) models � � 0 , K θ ( t , t ′ ) f ( t ) ∼ GP f k = A θ, k f k − 1 + q k , q k ∼ N ( 0 , Q k ) , y k ∼ p ( y k | f ( t k )) y k = h ( f k , σ k ) , σ k ∼ N ( 0 , Σ k ) inference in O ( n ) via Kalman filtering and smoothing State Space Expectation Propagation Wilkinson et. al. 2/10
Approximate Inference Kalman filter update step: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Kalman filter update step: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k ≈ N ( m predict , P predict k , P site ) N ( m site k ) k k � �� � “site” 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Kalman filter update step: Approx. Inference: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k select parameters ← ≈ N ( m predict , P predict ) N ( m site k , P site k ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Kalman filter update step: Approx. Inference: p ( f k | y 1 : k ) ∝ N ( m predict , P predict ) p ( y k | f ( t k )) k k select parameters ← ≈ N ( m predict , P predict ) N ( m site k , P site k ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k Our Contribution: Given marginal posterior N ( m post . , P post . ) , we show k k how approximate inference amounts to a simple site parameter update rule during smoothing. 2 f ( t ) 0 − 2 0 50 100 150 200 250 300 time - t State Space Expectation Propagation Wilkinson et. al. 3/10
Approximate Inference Smoothing: • update posterior with future observations, p ( f k | y 1 : N ) = N ( m post. , P post. ) k k Our Contribution: Given marginal posterior N ( m post . , P post . ) , we show k k how approximate inference amounts to a simple site parameter update rule during smoothing. This encompasses: • Power Expectation Propagation • Variational Inference (with natural gradients) • Extended Kalman Smoothing • Unscented / Gauss-Hermite Kalman Smoothing • Posterior Linearisation State Space Expectation Propagation Wilkinson et. al. 3/10
Parameter Update Rules for ∇ L k = d L k d m k Power Expectation Propagation: q cavity ( f k ) = q post. ( f k ) / q α site ( f k ) � � L k = log E q cavity p α ( y k | f k ) � � − 1 � P cavity P site ∇ 2 L k = − α + � k k � − 1 ∇ L k = m cavity m site ∇ 2 L k − � k k State Space Expectation Propagation Wilkinson et. al. 4/10
Parameter Update Rules for ∇ L k = d L k d m k Power Expectation Propagation: Variational Inference: q cavity ( f k ) = q post. ( f k ) / q α site ( f k ) � � L k = log E q cavity p α ( y k | f k ) � � L k = E q post. log p ( y k | f k ) � � − 1 � P cavity � − 1 P site ∇ 2 L k P site = − α + � � ∇ 2 L k = − k k k � − 1 ∇ L k � − 1 ∇ L k = m post. m site � ∇ 2 L k = m cavity − m site ∇ 2 L k − � k k k k State Space Expectation Propagation Wilkinson et. al. 4/10
Parameter Update Rules for ∇ L k = d L k d m k Power Expectation Propagation: Variational Inference: q cavity ( f k ) = q post. ( f k ) / q α site ( f k ) � � L k = log E q cavity p α ( y k | f k ) � � L k = E q post. log p ( y k | f k ) � � − 1 � P cavity � − 1 P site ∇ 2 L k P site = − α + � � ∇ 2 L k = − k k k � − 1 ∇ L k � − 1 ∇ L k = m post. m site � ∇ 2 L k = m cavity − m site ∇ 2 L k − � k k k k Extended Kalman Smoother: = y k − h ( m post . v k , 0 ) k f P post . S k = H ⊤ H f + H σ Σ k H ⊤ k σ � − 1 � � − 1 � P site H ⊤ H σ Σ k H ⊤ = H f k f σ = m post. + P post . m site + ( P site ) H ⊤ f S − 1 v k k k k k k for H f = d h d f and H σ = d h d σ , σ k ∼ N ( 0 , Σ k ) State Space Expectation Propagation Wilkinson et. al. 4/10
A Unifying Perspective • For sequential data, the EKF / UKF / GHKF are equivalent to single-sweep EP where the moment matching is solved via linearisation. State Space Expectation Propagation Wilkinson et. al. 5/10
A Unifying Perspective • For sequential data, the EKF / UKF / GHKF are equivalent to single-sweep EP where the moment matching is solved via linearisation. • The iterated Kalman smoothers (EKS / UKS / GHKS) can also be recovered under certain parameter choices. But note that they optimise a different objective to EP (see paper for details). State Space Expectation Propagation Wilkinson et. al. 5/10
A Unifying Perspective • For sequential data, the EKF / UKF / GHKF are equivalent to single-sweep EP where the moment matching is solved via linearisation. • The iterated Kalman smoothers (EKS / UKS / GHKS) can also be recovered under certain parameter choices. But note that they optimise a different objective to EP (see paper for details). • We show how natural gradient VI updates are surprisingly similar to the EP updates (when using a similar parametrisation). State Space Expectation Propagation Wilkinson et. al. 5/10
New Algorithms • We propose to mix the beneficial properties of EP with the efficiency of classical smoothers. State Space Expectation Propagation Wilkinson et. al. 6/10
Recommend
More recommend