State space methods for temporal GPs Arno Solin Assistant Professor - PowerPoint PPT Presentation

State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 � @arnosolin � arno.solin.fi

Outline Motivation: Recap Temporal models Spatio- Three views temporal GPs into GPs State space Further models extensions General likelihoods State space methods for temporal GPs Arno Solin 2/44

Motivation: Temporal models � One-dimensional problems (the data has a natural ordering) � Spatio-temporal models (something developing over time) � Long / unbounded data (sensor data streams, daily observations, etc.) State space methods for temporal GPs Arno Solin 3/44

Three views into GPs Kernel (moment) GP GP Spectral State space (Fourier) (path) State space methods for temporal GPs Arno Solin 4/44

Kernel (moment) representation f ( t ) ∼ GP ( µ ( t ) , κ ( t , t ′ )) GP prior � y | f ∼ p ( y i | f ( t i )) likelihood i ◮ Let’s focus on the GP prior only. ◮ A temporal Gaussian process (GP) is a random function f ( t ) , such that joint distribution of f ( t 1 ) , . . . , f ( t n ) is always Gaussian. ◮ Mean and covariance functions have the form: µ ( t ) = E [ f ( t )] , κ ( t , t ′ ) = E [( f ( t ) − µ ( t ))( f ( t ′ ) − µ ( t ′ )) T ] . ◮ Convenient for model specification, but expanding the kernel to a covariance matrix can be problematic (the notorious O ( n 3 ) scaling). State space methods for temporal GPs Arno Solin 5/44

Spectral (Fourier) representation ◮ The Fourier transform of a function f ( t ) : R → R is � F [ f ]( i ω ) = f ( t ) exp( − i ω t ) d t R ◮ For a stationary GP, the covariance function can be written in terms of the difference between two inputs: κ ( t , t ′ ) � κ ( t − t ′ ) ◮ Wiener–Khinchin: If f ( t ) is a stationary Gaussian process with covariance function κ ( t ) , then its spectral density is S ( ω ) = F [ κ ] . ◮ Spectral representation of a GP in terms of spectral density function S ( ω ) = E [˜ f ( i ω )˜ f T ( − i ω )] State space methods for temporal GPs Arno Solin 6/44

State space (path) representation [1/3] ◮ Path or state space representation as solution to a linear time-invariant (LTI) stochastic differential equation (SDE): d f = F f d t + L d β , where f = ( f , d f / d t , . . . ) and β ( t ) is a vector of Wiener processes. ◮ Equivalently, but more informally d f ( t ) = F f ( t ) + L w ( t ) , d t where w ( t ) is white noise. ◮ The model now consists of a drift matrix F ∈ R m × m , a diffusion matrix L ∈ R m × s , and the spectral density matrix of the white noise process Q c ∈ R s × s . ◮ The scalar-valued GP can be recovered by f ( t ) = h T f ( t ) . State space methods for temporal GPs Arno Solin 7/44

State space (path) representation [2/3] ◮ The initial state is given by a stationary state f ( 0 ) ∼ N ( 0 , P ∞ ) which fulfils F P ∞ + P ∞ F T + L Q c L T = 0 ◮ The covariance function at the stationary state can be recovered by h T P ∞ exp(( t ′ − t ) F ) T h , t ′ ≥ t � κ ( t , t ′ ) = h T exp(( t ′ − t ) F ) P ∞ h , t ′ < t where exp( · ) denotes the matrix exponential function. ◮ The spectral density function at the stationary state can be recovered by S ( ω ) = h T ( F + i ω I ) − 1 L Q c L T ( F − i ω I ) − T h State space methods for temporal GPs Arno Solin 8/44

State space (path) representation [3/3] ◮ Similarly as the kernel has to be evaluated into a covariance matrix for computations, the SDE can be solved for discrete time points { t i } n i = 1 . ◮ The resulting model is a discrete state space model: f i = A i − 1 f i − 1 + q i − 1 , q i ∼ N ( 0 , Q i ) , where f i = f ( t i ) . ◮ The discrete-time model matrices are given by: A i = exp( F ∆ t i ) , � ∆ t i exp( F (∆ t i − τ )) L Q c L T exp( F (∆ t i − τ )) T d τ, Q i = 0 where ∆ t i = t i + 1 − t i ◮ If the model is stationary, Q i is given by Q i = P ∞ − A i P ∞ A T i State space methods for temporal GPs Arno Solin 9/44

Three views into GPs Covariance function Spectral density function 1 2 0 . 8 1 . 5 0 . 6 κ ( τ ) S ( ω ) 1 0 . 4 0 . 5 0 . 2 0 0 − 4 − 2 0 2 4 − 4 − 2 0 2 4 τ = t − t ′ ω Sample functions 2 Output, f ( t ) 0 − 2 0 1 2 3 4 5 6 7 8 9 10 Input, t State space methods for temporal GPs Arno Solin 10/44

Example: Exponential covariance function ◮ Exponential covariance function (Ornstein-Uhlenbeck process): κ ( t , t ′ ) = exp( − λ | t − t ′ | ) ◮ Spectral density function: 2 S ( ω ) = λ + ω 2 /λ ◮ Path representation: Stochastic differential equation (SDE) d f ( t ) = − λ f ( t ) + w ( t ) , d t or using the notation from before: F = − λ , L = 1, Q c = 2, h = 1, and P ∞ = 1. State space methods for temporal GPs Arno Solin 11/44

Examples of applicable GP priors State space methods for temporal GPs Arno Solin 12/44

Applicable GP priors ◮ The covariance function needs to be Markovian (or approximated as such). ◮ Covers many common stationary and non-stationary models. ◮ Sums of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) + κ 2 ( t , t ′ ) • Stacking of the state spaces • State dimension: m = m 1 + m 2 ◮ Product of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) κ 2 ( t , t ′ ) • Kronecker sum of the models • State dimension: m = m 1 m 2 State space methods for temporal GPs Arno Solin 13/44

Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 14/44

Example: GP regression, O ( n 3 ) ◮ Consider the GP regression problem with input–output training pairs { ( t i , y i ) } n i = 1 : f ( t ) ∼ GP ( 0 , κ ( t , t ′ )) , ε i ∼ N ( 0 , σ 2 y i = f ( t i ) + ε i , n ) ◮ The posterior mean and variance for an unseen test input t ∗ is given by (see previous lectures): n I ) − 1 y , E [ f ∗ ] = k ∗ ( K + σ 2 n I ) − 1 k T V [ f ∗ ] = K ∗∗ − k ∗ ( K + σ 2 ∗ ◮ Note the inversion of the n × n matrix. State space methods for temporal GPs Arno Solin 15/44

Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 16/44

Example: GP regression, O ( n ) ◮ The sequential solution (goes under the name ‘Kalman filter’) considers one data point at a time, hence the linear time-scaling. ◮ Start from m 0 = 0 and P 0 = P ∞ and for each data point iterate the following steps. ◮ Kalman prediction: m i | i − 1 = A i − 1 m i − 1 | i − 1 , P i | i − 1 = A i − 1 P i − 1 | i − 1 A T i − 1 + Q i − 1 . ◮ Kalman update: v i = y i − h T m i | i − 1 , S i = h T P i | i − 1 h + σ 2 n , K i = P i | i − 1 h S − 1 , i m i | i = m i | i − 1 + K i v i , P i | i = P i | i − 1 − K i S i K T i . State space methods for temporal GPs Arno Solin 17/44

Example: GP regression, O ( n ) ◮ To condition all time-marginals on all data, run a backward sweep (Rauch–Tung–Striebel smoother): m i + 1 | i = A i m i | i , P i + 1 | i = A i P i | i A T i + Q i , G i = P i | i A T i P − 1 i + 1 | i , m i | n = m i | i + G i ( m i + 1 | n − m i + 1 | i ) , P i | n = P i | i + G i ( P i + 1 | n − P i + 1 | i ) G T i , ◮ The marginal mean and variance can be recovered by: E [ f i ] = h T m i | n , V [ f i ] = h T P i | n h ◮ The log marginal likelihood can be evaluated as a by-product of the Kalman update: n log p ( y ) = − 1 � log | 2 π S i | + v T i S − 1 v i i 2 i = 1 State space methods for temporal GPs Arno Solin 18/44

Example: GP regression, O ( n ) State space methods for temporal GPs Arno Solin 19/44

Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood State space methods for temporal GPs Arno Solin 20/44

Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood Explaining changes in number of births in the US State space methods for temporal GPs Arno Solin 20/44

Connection to banded precision matrices State space methods for temporal GPs Arno Solin 21/44

Precision matrices Covariance (Gram) matrix: Precision matrix: K = κ ( X , X ) K − 1 Q = k ( X , X ) 1 K = k ( X , X ) 3 1.00 0 0 0.75 2 1 1 0.50 1 2 2 0.25 3 0 3 0.00 0.25 4 4 1 0.50 5 5 2 0.75 6 6 3 1.00 0 1 2 3 4 5 6 0 1 2 3 4 5 6 For Markovian models the precision is sparse! (block tri-diagonal) see Durrande et al. (2019) State space methods for temporal GPs Arno Solin 22/44

State space methods for temporal GPs Arno Solin Assistant Professor - PowerPoint PPT Presentation

State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 @arnosolin arno.solin.fi Outline Motivation:

Introduction to State Space Methods Siem Jan Koopman s.j.koopman@feweb.vu.nl Vrije Universiteit

INC 541 Modern Control Theory Using State Space Methods

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

State Space Expectation Propagation Efficient Inference Schemes for Temporal Gaussian Processes

INC 541 Modern Control Theory Using State Space Methods

INC 541 Modern Control Theory Using State Space Methods

Temporal Difference Methods, Off-Policy Methods Milan Straka October 21, 2019 Charles

Extension of PROMETHEE methods to temporal evaluations PhD student: Issam Banamar Supervisor:

Partial-Order Planning 1 State-Space vs. Plan-Space State-space ( situation space ) planning

Random Walk Example Values learned by TD(0) after various numbers of episodes Optimality of

Space weather impacts and predictions: relevant spatial and temporal scales Pulkkinen, A. NASA

A Crash Course on A Crash Course on Temporal Specifications Temporal Specifications [Kansas

Overview Video classification Bag of spatio-temporal features Action localization

Overview Optical flow Video classification Bag of spatio-temporal features Action

Short Course State Space Models, Generalized Dynamic Systems and Sequential Monte Carlo Methods,

Overview of State Space Models Standard State Space Model Standard state space model x n +1 =

Temporal Difference Methods CS60077: Reinforcement Learning Abir Das IIT Kharagpur Oct 12, 13,

Algorithmen fr die Echtzeitgrafik Algorithmen fr die Echtzeitgrafik Temporal Coherence

Statistical Methods for Multivariate Spatial and Spatial-Temporal Processes Mikyoung Jun

State Space Search 1/23/17 State space problems have A set of discrete states A

Foundations of Artificial Intelligence 5. State-Space Search: State Spaces Malte Helmert

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Homogeneous temporal activity patterns in a large online communication space A. Kaltenbrunner 1 ,

Chapter 6: Temporal Difference Learning Objectives of this chapter: Introduce Temporal Difference