bayesian learning from sequential data using gaussian
play

Bayesian Learning from Sequential Data using Gaussian Processes with - PowerPoint PPT Presentation

Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances Csaba Toth Joint work with Harald Oberhauser Mathematical Institute, University of Oxford International Conference on Machine Learning, July 2020


  1. Bayesian Learning from Sequential Data using Gaussian Processes with Signature Covariances Csaba Toth Joint work with Harald Oberhauser Mathematical Institute, University of Oxford International Conference on Machine Learning, July 2020

  2. Overview

  3. Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series 2/62

  4. Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) 3/62

  5. Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R 4/62

  6. Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 5/62

  7. Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 2. Develop an efficient inference framework 6/62

  8. Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 2. Develop an efficient inference framework ◮ Standard challenges: intractable posteriors, O ( N 3 ) scaling in training data 7/62

  9. Overview Purpose of this work 1. Define a Gaussian process (GP) [6] over sequences/time series ◮ To model of functions of sequences { Seq( R d ) → R } ( f x ) x ∈ Seq( R d ) ∼ GP ( m ( · ) , k ( · , · )) ◮ Find a suitable covariance kernel k : Seq( R d ) × Seq( R d ) → R ◮ Seq( R d ) := { ( x t 1 , . . . , x t L ) | ( t i , x t i ) ∈ R + × R d , L ∈ N } 2. Develop an efficient inference framework ◮ Standard challenges: intractable posteriors, O ( N 3 ) scaling in training data ◮ Additional challenge: potentially very high dimensional inputs (long sequences) 8/62

  10. Overview Suitable feature map? Signatures from stochastic analysis [2]! 9/62

  11. Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels 10/62

  12. Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels ◮ κ : R d × R d → R a kernel for vector-valued data 11/62

  13. Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels ◮ κ : R d × R d → R a kernel for vector-valued data ◮ [4] used signatures to define the kernel for x , y ∈ Seq( R d ) M m � σ 2 � � k( x , y ) = c ( i ) c ( j ) ∆ i l , j l κ ( x i l , y j l ) m m =0 1 ≤ i 1 < ··· < i m ≤ L x l =1 1 ≤ j 1 < ··· < j m ≤ L y for some explicitly given constants c ( i 1 , . . . , i m ) , c ( j 1 , . . . , j m ) ∆ i , j κ ( x i , y j ) = κ ( x i +1 , y j +1 ) − κ ( x i , y j +1 ) − κ ( x i +1 , y j ) + κ ( x i , y j ) 12/62

  14. Overview Suitable feature map? Signatures from stochastic analysis [2]! Can be used to transform vector-kernels into sequence-kernels ◮ κ : R d × R d → R a kernel for vector-valued data ◮ [4] used signatures to define the kernel for x , y ∈ Seq( R d ) M m � σ 2 � � k( x , y ) = c ( i ) c ( j ) ∆ i l , j l κ ( x i l , y j l ) m m =0 1 ≤ i 1 < ··· < i m ≤ L x l =1 1 ≤ j 1 < ··· < j m ≤ L y for some explicitly given constants c ( i 1 , . . . , i m ) , c ( j 1 , . . . , j m ) ∆ i , j κ ( x i , y j ) = κ ( x i +1 , y j +1 ) − κ ( x i , y j +1 ) − κ ( x i +1 , y j ) + κ ( x i , y j ) ◮ Strong theoretical properties! 13/62

  15. Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) 14/62

  16. Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 15/62

  17. Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 1. Sparse VI [3]: non-conjugacy, large N ∈ N 16/62

  18. Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 1. Sparse VI [3]: non-conjugacy, large N ∈ N 2. Inter-domain inducing points: long sequences (sup x ∈ X L x large) 17/62

  19. Overview Our contributions ◮ Bringing GPs and signatures together (+analysis) ◮ Developing a tractable, efficient inference scheme 1. Sparse VI [3]: non-conjugacy, large N ∈ N 2. Inter-domain inducing points: long sequences (sup x ∈ X L x large) ◮ GPflow implementation, thorough experimental evaluation 18/62

  20. Signatures

  21. Signatures What are signatures? 19/62

  22. Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ 20/62

  23. Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m 21/62

  24. Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m Φ m ( x ) ∈ ( R d ) ⊗ m is what is known as a tensor of degree m ∈ N 22/62

  25. Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m Φ m ( x ) ∈ ( R d ) ⊗ m is what is known as a tensor of degree m ∈ N Φ( x ) = (Φ m ( x )) m ≥ 0 is an infinite collection of tensors with increasing degrees 23/62

  26. Signatures What are signatures? Signatures are defined on continuous time objects, paths � � ◮ Paths( R d ) = x ∈ C ([0 , T ] , R d ) | x 0 = 0 , � x � bv < + ∞ Φ m ( x ) = � 0 < t 1 < ··· < t m < T ˙ x t 1 ⊗ · · · ⊗ ˙ x t m dt 1 . . . dt m Φ m ( x ) ∈ ( R d ) ⊗ m is what is known as a tensor of degree m ∈ N Φ( x ) = (Φ m ( x )) m ≥ 0 is an infinite collection of tensors with increasing degrees A generalization of polynomials for vector-valued data to paths (and sequences!) 24/62

Recommend


More recommend