kernel design
play

Kernel Design GP Summer School, Sheffield, September 2016 Nicolas - PowerPoint PPT Presentation

Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Kernel Design GP Summer School, Sheffield, September 2016 Nicolas Durrande, Mines St-tienne, durrande@emse.fr GP Summer School


  1. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Kernel Design GP Summer School, Sheffield, September 2016 Nicolas Durrande, Mines St-Étienne, durrande@emse.fr GP Summer School Kernel Design 1 / 60

  2. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion GP Summer School Kernel Design 2 / 60

  3. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion GP Summer School Kernel Design 3 / 60

  4. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion We have seen during the introduction lectures that the distribution of a GP Z depends on two functions : the mean m ( x ) = E ( Z ( x )) the covariance k ( x , x ′ ) = cov ( Z ( x ) , Z ( x ′ )) In this talk, we will focus on the covariance function , which is often call the kernel . GP Summer School Kernel Design 4 / 60

  5. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion We assume we have observed a function f for a limited number of time points x 1 , . . . , x n : 1.5 1.0 0.5 f ( x ) 0.0 -0.5 -1.0 0.0 0.2 0.4 0.6 0.8 1.0 x The observations are denoted by f i = f ( x i ) (or F = f ( X )). GP Summer School Kernel Design 5 / 60

  6. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Since f in unknown, we make the general assumption that it is a sample path of a Gaussian process Z : 4 3 2 Z ( x ) 1 0 -1 -2 -3 0.0 0.2 0.4 0.6 0.8 1.0 x GP Summer School Kernel Design 6 / 60

  7. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Combining these two informations means keeping the samples interpolating the data points : 1.5 Z ( x ) | Z ( X ) = F 1.0 0.5 0.0 -0.5 -1.0 0.0 0.2 0.4 0.6 0.8 1.0 x GP Summer School Kernel Design 7 / 60

  8. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion The conditional distribution is still Gaussian with moments : m ( x ) = E ( Z ( x ) | Z ( X ) = F ) = k ( x , X ) k ( X , X ) − 1 F c ( x , x ′ ) = cov ( Z ( x ) , Z ( x ′ ) | Z ( X ) = F ) = k ( x , x ′ ) − k ( x , X ) k ( X , X ) − 1 k ( X , x ′ ) It can be represented as a mean function with confidence intervals. 1.5 Z ( x ) | Z ( X ) = F 1.0 0.5 0.0 -0.5 -1.0 0.0 0.2 0.4 0.6 0.8 1.0 x GP Summer School Kernel Design 8 / 60

  9. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion GP Summer School Kernel Design 9 / 60

  10. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Let Z be a random process with kernel k . Some properties of kernels can be obtained directly from their definition. Example k ( x , x ) = cov ( Z ( x ) , Z ( x )) = var ( Z ( x )) ≥ 0 ⇒ k ( x , x ) is positive . k ( x , y ) = cov ( Z ( x ) , Z ( y )) = cov ( Z ( y ) , Z ( x )) = k ( y , x ) ⇒ k ( x , y ) is symmetric . We can obtain a thinner result... GP Summer School Kernel Design 10 / 60

  11. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion We introduce the random variable T = � n i =1 a i Z ( x i ) where n , a i and x i are arbitrary. Computing the variance of T gives :   � � � �  = var ( T ) = cov a i Z ( x i ) , a j Z ( x j ) a i a j cov ( Z ( x i ) , Z ( x j )) i j i j � � = a i a j k ( x i , x j ) Since a variance is positive, we have � � a i a j k ( x i , x j ) ≥ 0 i j for any arbitrary n , a i and x i . Definition The functions satisfying the above inequality for all n ∈ N , for all x i ∈ D , for all a i ∈ R are called positive semi-definite functions. GP Summer School Kernel Design 11 / 60

  12. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion We have just seen : k is a covariance ⇒ k is a positive semi-definite function The reverse is also true : Theorem (Loeve) k corresponds to the covariance of a GP � k is a symmetric positive semi-definite function GP Summer School Kernel Design 12 / 60

  13. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Proving that a function is psd is often intractable. However there are a lot of functions that have already been proven to be psd : � � − ( x − y ) 2 k ( x , y ) = σ 2 exp squared exp. 2 θ 2 √ √ � � � � + 5 | x − y | 2 5 | x − y | 5 | x − y | k ( x , y ) = σ 2 Matern 5/2 1 + exp − 3 θ 2 θ θ √ √ � � � � 3 | x − y | 3 | x − y | k ( x , y ) = σ 2 Matern 3/2 1 + exp − θ θ � � − | x − y | k ( x , y ) = σ 2 exp exponential θ k ( x , y ) = σ 2 min( x , y ) Brownian k ( x , y ) = σ 2 δ x , y white noise k ( x , y ) = σ 2 constant k ( x , y ) = σ 2 xy linear When k is a function of x − y , the kernel is called stationary . σ 2 is called the variance and θ the lengthscale . GP Summer School Kernel Design 13 / 60

  14. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion GP Summer School Kernel Design 14 / 60

  15. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion If k is stationary psd implies further results : Properties If ˜ k is n times differentiable in 0, then it is n times differentiable everywhere. The maximum value of ˜ k ( t ) is reached in t = 0. Example The following functions are not valid covariance structures K ( t ) K ( t ) K ( t ) t t t GP Summer School Kernel Design 15 / 60

  16. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion For a few kernels, it is possible to prove they are psd directly from the definition. k ( x , y ) = δ x , y k ( x , y ) = 1 For most of them a direct proof from the definition is not possible. The following theorem is helpful for stationary kernels : Theorem (Bochner) A continuous stationary function k ( x , y ) = ˜ k ( | x − y | ) is positive definite if and only if ˜ k is the Fourier transform of a finite positive measure : � ˜ e − i ω t d µ ( ω ) k ( t ) = R GP Summer School Kernel Design 16 / 60

  17. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Example We consider the following measure : 0.0 k ( t ) = sin( t ) Its Fourier transform gives ˜ : t 0.0 As a consequence, k ( x , y ) = sin( x − y ) is a valid covariance x − y function. GP Summer School Kernel Design 17 / 60

  18. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Usual kernels Bochner theorem can be used to prove the positive definiteness of many usual stationary kernels The Gaussian is the Fourier transform of itself ⇒ it is psd. Matérn kernels are the Fourier transforms of 1 (1+ ω 2 ) p ⇒ they are psd. GP Summer School Kernel Design 18 / 60

  19. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Unusual kernels Inverse Fourier transform of a (symmetrised) sum of Gaussian gives (A. Wilson, ICML 2013) : ˜ µ ( ω ) k ( t ) − → F 0.0 0.0 The obtained kernel is parametrised by its spectrum. GP Summer School Kernel Design 19 / 60

  20. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Unusual kernels The sample paths have the following shape : 6 4 2 0 2 4 6 0 1 2 3 4 5 GP Summer School Kernel Design 20 / 60

  21. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Introduction What is a kernel ? Choosing the appropriate kernel Making new from old Effect of linear operators Application : Periodicity detection Conclusion GP Summer School Kernel Design 21 / 60

  22. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion Changing the kernel has a huge impact on the model : Gaussian kernel: Exponential kernel: GP Summer School Kernel Design 22 / 60

  23. Introduction What is a kernel ? Kernel choice Making new from old linear operators Application Conclusion This is because changing the kernel implies changing the prior Gaussian kernel: Exponential kernel: GP Summer School Kernel Design 23 / 60

Recommend


More recommend