Stochastic (Partial) Differential Equations and Gaussian Processes Simo Särkkä Aalto University, Finland
Why use S(P)DE solvers for GPs? The O ( n 3 ) computational complexity is always a challenge. Latent force models combine PDE/ODEs with GPs. What do we get: Sparse approximations developed for SPDEs. Reduced rank Fourier/basis function approximations. The use of Markov properties and Markov approximations. State-space methods for SDEs/SPDEs. Path to non-Gaussian processes. Downsides: Approximations of non-parametric models with parametric models. Approximations of a non-Markovian models as Markovian. Mathematics can become messy. S(P)DEs and GPs Simo Särkkä 2 / 12
Kernel vs. SPDE representations of GPs GP model x ∈ R d , t ∈ R Equivalent Static SPDE model Homogenous k ( x , x ′ ) SPDE model L f ( x ) = w ( x ) Stationary k ( t , t ′ ) State-space/Itô-SDE model d f ( t ) = A f ( t ) dt + L dW ( t ) Homogenous/stationary Stochastic evolution equation k ( x , t ; x ′ , t ′ ) ∂ t f ( x , t ) = A x f ( x , t ) dt + L dW ( x , t ) S(P)DEs and GPs Simo Särkkä 3 / 12
Basic idea of SPDE inference on GPs [1/2] Consider e.g. the stochastic partial differential equation: ∂ 2 f ( x , y ) + ∂ 2 f ( x , y ) − λ 2 f ( x , y ) = w ( x , y ) ∂ x 2 ∂ y 2 Fourier transforming gives the spectral density: � − 2 � λ 2 + ω 2 x + ω 2 S ( ω x , ω y ) ∝ . y Inverse Fourier transform gives the covariance function: ( x − x ′ ) 2 + ( y − y ′ ) 2 � � ( x − x ′ ) 2 + ( y − y ′ ) 2 ) k ( x , y ; x ′ , y ′ ) = K 1 ( λ 2 λ But this is just the Matérn covariance function. The corresponding RKHS is actually a Sobolev space. S(P)DEs and GPs Simo Särkkä 4 / 12
Basic idea of SPDE inference on GPs [2/2] More generally, SPDE for some linear operator L : L f ( x ) = w ( x ) Now f is a GP with precision and covariance operators: K − 1 = L ∗ L K = ( L ∗ L ) − 1 Idea: approximate L or L − 1 using PDE/ODE methods: Finite-differences/FEM methods lead to sparse precision 1 approximations. Fourier/basis-function methods lead to reduced rank 2 covariance approximations. Spectral factorization leads to state-space (Kalman) 3 methods which are time-recursive (or sparse in precision). S(P)DEs and GPs Simo Särkkä 5 / 12
Finite-differences/FEM – sparse precision Basic idea: ∂ f ( x ) ≈ f ( x + h ) − f ( x ) ∂ x h ∂ 2 f ( x ) ≈ f ( x + h ) − 2 f ( x ) + f ( x − h ) ∂ x 2 h 2 We get an SPDE approximation L ≈ L , where L is sparse The precision operator approximation is then sparse: K − 1 ≈ L T L = sparse L need to be approximated as integro-differential operator. Requires formation of a grid, but parallelizes well. S(P)DEs and GPs Simo Särkkä 6 / 12
Classical and random Fourier methods – reduced rank approximations and FFT Approximation: 2 π i k T x � � � f ( x ) ≈ c k exp k ∈ N d c k ∼ Gaussian We use less coefficients c k than the number of data points. Leads to reduced-rank covariance approximations 2 π i k T x ′ � ∗ � 2 π i k T x � � � k ( x , x ′ ) ≈ σ 2 k exp exp | k |≤ N Truncated series, random frequencies, FFT, . . . S(P)DEs and GPs Simo Särkkä 7 / 12
Hilbert-space/Galerkin methods – reduced rank approximations Approximation: � f ( x ) ≈ c i φ i ( x ) i � φ i , φ j � H ≈ δ ij , e.g. ∇ 2 φ i = − λ i φ i Again, use less coefficients than the number of data points. Reduced-rank covariance approximations such as N � k ( x , x ′ ) ≈ σ 2 i φ i ( x ) φ i ( x ′ ) . i = 1 Wavelets, Galerkin, finite elements, . . . S(P)DEs and GPs Simo Särkkä 8 / 12
State-space methods – Kalman filters and sparse precision The state at time t Approximation: f ( x, t ) S ( ω ) ≈ b 0 + b 1 ω 2 + · · · + b M ω 2 M a 0 + a 1 ω 2 + · · · + a N ω 2 N L o c ) a t t ( i e o m n i ( T x ) Results in a linear stochastic differential equation (SDE) d f ( t ) = A f ( t ) dt + L d W More generally stochastic evolution equations. O ( n ) GP regression with Kalman filters and smoothers. Parallel block-sparse precision methods − → O ( log n ) . S(P)DEs and GPs Simo Särkkä 9 / 12
State-space methods – Kalman filters and sparse precision (cont.) Example (Matérn class 1d) The Matérn class of covariance functions is � √ � √ � ν � k ( t , t ′ ) = σ 2 2 1 − ν 2 ν 2 ν | t − t ′ | | t − t ′ | K ν . Γ( ν ) ℓ ℓ When, e.g., ν = 3 / 2, we have � 0 � 0 � � 1 d f ( t ) = f ( t ) dt + dW ( t ) , − λ 2 q 1 / 2 − 2 λ � � f ( t ) = 1 0 f ( t ) . S(P)DEs and GPs Simo Särkkä 10 / 12
State-space methods – Kalman filters and sparse precision (cont.) Example (2D Matérn covariance function) Consider a space-time Matérn covariance function k ( x , t ; x ′ , t ′ ) = σ 2 2 1 − ν � √ � √ 2 ν ρ � ν 2 ν ρ � K ν . Γ( ν ) l l ( t − t ′ ) 2 + ( x − x ′ ) 2 , ν = 1 and � where we have ρ = d = 2. We get the following representation: � � 0 1 � 0 � d f ( x , t ) = f ( x , t ) dt + dW ( x , t ) . � ∂ 2 λ 2 − ∂ 2 ∂ x 2 − λ 2 1 − 2 ∂ x 2 S(P)DEs and GPs Simo Särkkä 11 / 12
What then? Inducing point methods = basis function methods Inference on the basis functions/point-locations/etc. Non-Gaussian processes, non-Gaussian likelihoods. Combined first-principles and nonparametric models – latent force models (LFM). Inverse problems – operators in measurement model. State-space stochastic control in Gaussian processes and LFMs. SPDE methods for SVMs Kernel embedding of S(P)DEs Deep S(P)DE models S(P)DEs and GPs Simo Särkkä 12 / 12
Recommend
More recommend