state space methods for temporal gps
play

State space methods for temporal GPs Arno Solin Assistant Professor - PowerPoint PPT Presentation

State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 @arnosolin arno.solin.fi Outline Motivation:


  1. State space methods for temporal GPs Arno Solin Assistant Professor in Machine Learning Department of Computer Science Aalto University G AUSSIAN P ROCESS S UMMER S CHOOL September 11, 2019 � @arnosolin � arno.solin.fi

  2. Outline Motivation: Recap Temporal models Spatio- Three views temporal GPs into GPs State space Further models extensions General likelihoods State space methods for temporal GPs Arno Solin 2/44

  3. Motivation: Temporal models � One-dimensional problems (the data has a natural ordering) � Spatio-temporal models (something developing over time) � Long / unbounded data (sensor data streams, daily observations, etc.) State space methods for temporal GPs Arno Solin 3/44

  4. Three views into GPs Kernel (moment) GP GP Spectral State space (Fourier) (path) State space methods for temporal GPs Arno Solin 4/44

  5. Kernel (moment) representation f ( t ) ∼ GP ( µ ( t ) , κ ( t , t ′ )) GP prior � y | f ∼ p ( y i | f ( t i )) likelihood i ◮ Let’s focus on the GP prior only. ◮ A temporal Gaussian process (GP) is a random function f ( t ) , such that joint distribution of f ( t 1 ) , . . . , f ( t n ) is always Gaussian. ◮ Mean and covariance functions have the form: µ ( t ) = E [ f ( t )] , κ ( t , t ′ ) = E [( f ( t ) − µ ( t ))( f ( t ′ ) − µ ( t ′ )) T ] . ◮ Convenient for model specification, but expanding the kernel to a covariance matrix can be problematic (the notorious O ( n 3 ) scaling). State space methods for temporal GPs Arno Solin 5/44

  6. Spectral (Fourier) representation ◮ The Fourier transform of a function f ( t ) : R → R is � F [ f ]( i ω ) = f ( t ) exp( − i ω t ) d t R ◮ For a stationary GP, the covariance function can be written in terms of the difference between two inputs: κ ( t , t ′ ) � κ ( t − t ′ ) ◮ Wiener–Khinchin: If f ( t ) is a stationary Gaussian process with covariance function κ ( t ) , then its spectral density is S ( ω ) = F [ κ ] . ◮ Spectral representation of a GP in terms of spectral density function S ( ω ) = E [˜ f ( i ω )˜ f T ( − i ω )] State space methods for temporal GPs Arno Solin 6/44

  7. State space (path) representation [1/3] ◮ Path or state space representation as solution to a linear time-invariant (LTI) stochastic differential equation (SDE): d f = F f d t + L d β , where f = ( f , d f / d t , . . . ) and β ( t ) is a vector of Wiener processes. ◮ Equivalently, but more informally d f ( t ) = F f ( t ) + L w ( t ) , d t where w ( t ) is white noise. ◮ The model now consists of a drift matrix F ∈ R m × m , a diffusion matrix L ∈ R m × s , and the spectral density matrix of the white noise process Q c ∈ R s × s . ◮ The scalar-valued GP can be recovered by f ( t ) = h T f ( t ) . State space methods for temporal GPs Arno Solin 7/44

  8. State space (path) representation [2/3] ◮ The initial state is given by a stationary state f ( 0 ) ∼ N ( 0 , P ∞ ) which fulfils F P ∞ + P ∞ F T + L Q c L T = 0 ◮ The covariance function at the stationary state can be recovered by h T P ∞ exp(( t ′ − t ) F ) T h , t ′ ≥ t � κ ( t , t ′ ) = h T exp(( t ′ − t ) F ) P ∞ h , t ′ < t where exp( · ) denotes the matrix exponential function. ◮ The spectral density function at the stationary state can be recovered by S ( ω ) = h T ( F + i ω I ) − 1 L Q c L T ( F − i ω I ) − T h State space methods for temporal GPs Arno Solin 8/44

  9. State space (path) representation [3/3] ◮ Similarly as the kernel has to be evaluated into a covariance matrix for computations, the SDE can be solved for discrete time points { t i } n i = 1 . ◮ The resulting model is a discrete state space model: f i = A i − 1 f i − 1 + q i − 1 , q i ∼ N ( 0 , Q i ) , where f i = f ( t i ) . ◮ The discrete-time model matrices are given by: A i = exp( F ∆ t i ) , � ∆ t i exp( F (∆ t i − τ )) L Q c L T exp( F (∆ t i − τ )) T d τ, Q i = 0 where ∆ t i = t i + 1 − t i ◮ If the model is stationary, Q i is given by Q i = P ∞ − A i P ∞ A T i State space methods for temporal GPs Arno Solin 9/44

  10. Three views into GPs Covariance function Spectral density function 1 2 0 . 8 1 . 5 0 . 6 κ ( τ ) S ( ω ) 1 0 . 4 0 . 5 0 . 2 0 0 − 4 − 2 0 2 4 − 4 − 2 0 2 4 τ = t − t ′ ω Sample functions 2 Output, f ( t ) 0 − 2 0 1 2 3 4 5 6 7 8 9 10 Input, t State space methods for temporal GPs Arno Solin 10/44

  11. Example: Exponential covariance function ◮ Exponential covariance function (Ornstein-Uhlenbeck process): κ ( t , t ′ ) = exp( − λ | t − t ′ | ) ◮ Spectral density function: 2 S ( ω ) = λ + ω 2 /λ ◮ Path representation: Stochastic differential equation (SDE) d f ( t ) = − λ f ( t ) + w ( t ) , d t or using the notation from before: F = − λ , L = 1, Q c = 2, h = 1, and P ∞ = 1. State space methods for temporal GPs Arno Solin 11/44

  12. Examples of applicable GP priors State space methods for temporal GPs Arno Solin 12/44

  13. Applicable GP priors ◮ The covariance function needs to be Markovian (or approximated as such). ◮ Covers many common stationary and non-stationary models. ◮ Sums of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) + κ 2 ( t , t ′ ) • Stacking of the state spaces • State dimension: m = m 1 + m 2 ◮ Product of kernels: κ ( t , t ′ ) = κ 1 ( t , t ′ ) κ 2 ( t , t ′ ) • Kronecker sum of the models • State dimension: m = m 1 m 2 State space methods for temporal GPs Arno Solin 13/44

  14. Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 14/44

  15. Example: GP regression, O ( n 3 ) ◮ Consider the GP regression problem with input–output training pairs { ( t i , y i ) } n i = 1 : f ( t ) ∼ GP ( 0 , κ ( t , t ′ )) , ε i ∼ N ( 0 , σ 2 y i = f ( t i ) + ε i , n ) ◮ The posterior mean and variance for an unseen test input t ∗ is given by (see previous lectures): n I ) − 1 y , E [ f ∗ ] = k ∗ ( K + σ 2 n I ) − 1 k T V [ f ∗ ] = K ∗∗ − k ∗ ( K + σ 2 ∗ ◮ Note the inversion of the n × n matrix. State space methods for temporal GPs Arno Solin 15/44

  16. Example: GP regression, O ( n 3 ) State space methods for temporal GPs Arno Solin 16/44

  17. Example: GP regression, O ( n ) ◮ The sequential solution (goes under the name ‘Kalman filter’) considers one data point at a time, hence the linear time-scaling. ◮ Start from m 0 = 0 and P 0 = P ∞ and for each data point iterate the following steps. ◮ Kalman prediction: m i | i − 1 = A i − 1 m i − 1 | i − 1 , P i | i − 1 = A i − 1 P i − 1 | i − 1 A T i − 1 + Q i − 1 . ◮ Kalman update: v i = y i − h T m i | i − 1 , S i = h T P i | i − 1 h + σ 2 n , K i = P i | i − 1 h S − 1 , i m i | i = m i | i − 1 + K i v i , P i | i = P i | i − 1 − K i S i K T i . State space methods for temporal GPs Arno Solin 17/44

  18. Example: GP regression, O ( n ) ◮ To condition all time-marginals on all data, run a backward sweep (Rauch–Tung–Striebel smoother): m i + 1 | i = A i m i | i , P i + 1 | i = A i P i | i A T i + Q i , G i = P i | i A T i P − 1 i + 1 | i , m i | n = m i | i + G i ( m i + 1 | n − m i + 1 | i ) , P i | n = P i | i + G i ( P i + 1 | n − P i + 1 | i ) G T i , ◮ The marginal mean and variance can be recovered by: E [ f i ] = h T m i | n , V [ f i ] = h T P i | n h ◮ The log marginal likelihood can be evaluated as a by-product of the Kalman update: n log p ( y ) = − 1 � log | 2 π S i | + v T i S − 1 v i i 2 i = 1 State space methods for temporal GPs Arno Solin 18/44

  19. Example: GP regression, O ( n ) State space methods for temporal GPs Arno Solin 19/44

  20. Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood State space methods for temporal GPs Arno Solin 20/44

  21. Basic regression example ◮ Number of births in the US (from BDA3 by Gelman et al. ) ◮ Daily data between 1969–1988 ( n = 7305) ◮ GP regression with a prior covariance function: κ ( t , t ′ ) = κ ν = 5 / 2 ( t , t ′ ) + κ ν = 3 / 2 ( t , t ′ ) Mat. Mat. + κ year Per. ( t , t ′ ) κ ν = 3 / 2 Per. ( t , t ′ ) κ ν = 3 / 2 ( t , t ′ ) + κ week ( t , t ′ ) Mat. Mat. ◮ Learn hyperparameters by optimizing the marginal likelihood Explaining changes in number of births in the US State space methods for temporal GPs Arno Solin 20/44

  22. Connection to banded precision matrices State space methods for temporal GPs Arno Solin 21/44

  23. Precision matrices Covariance (Gram) matrix: Precision matrix: K = κ ( X , X ) K − 1 Q = k ( X , X ) 1 K = k ( X , X ) 3 1.00 0 0 0.75 2 1 1 0.50 1 2 2 0.25 3 0 3 0.00 0.25 4 4 1 0.50 5 5 2 0.75 6 6 3 1.00 0 1 2 3 4 5 6 0 1 2 3 4 5 6 For Markovian models the precision is sparse! (block tri-diagonal) see Durrande et al. (2019) State space methods for temporal GPs Arno Solin 22/44

Recommend


More recommend