> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE More on kernels Marcel Lüthi Graphics and Vision Research Group Department of Mathematics and Computer Science University of Basel
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Kernels everywhere Integral and differential equations • Aronszajn, Nachman. "Theory of reproducing kernels." Transactions of the American mathematical society (1950): 337-404. Numerical analysis, Approximation and Interpolation theory • Wahba, Grace. Spline models for observational data . Vol. 59. Siam, 1990. • Schaback, Robert, and Holger Wendland. "Kernel techniques: From machine learning to meshless methods." Acta Numerica 15 (2006): 543-639. • Hennig, Philipp, and Osborn, Michael: Probabilistic numerics • Geostatistics (Gaussian processes) • Stein, Michael L. Interpolation of spatial data: some theory for kriging . Springer Science & Business Media, 1999. 2
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Kernels everywhere • Learning Theory / Machine learning • Vapnik, Vladimir. Statistical learning theory . Vol. 1. New York: Wiley, 1998. • Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. "Kernel methods in machine learning." The annals of statistics (2008): 1171-1220. • Shape modelling / Image analysis • Grenander, Ulf, and Michael I. Miller. "Computational anatomy: An emerging discipline." Quarterly of applied mathematics 56.4 (1998): 617-694. • Younes, Laurent: Shapes and diffeomorphisms, Springer 2010 3
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE What do they have in common? • Solution space has a rich structure ML to be able to: • Predict unseen values • Deal with noisy or incomplete data Image analysis Statistics • Capture a pattern • Kernels ideally suited to define Differential such structure Numerics equations • The resulting space of functions is mathematically “nice”. 4
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Back to basics: Scalar-valued GPs Vector-valued (this course) Scalar-valued (more common) • Samples u are deformation • Samples f are real-valued functions fields: 𝑣: 𝒴 → ℝ 𝑒 𝑔 ∶ 𝒴 → ℝ
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Scalar-valued Gaussian processes Vector-valued (this course) Scalar-valued (more common) 𝑣 ∼ 𝐻𝑄 Ԧ 𝜈, 𝒍 𝑔 ∼ 𝐻𝑄 𝜈, 𝑙 𝜈: 𝒴 → ℝ 𝑒 Ԧ 𝜈: 𝒴 → ℝ 𝒍: 𝒴 × 𝒴 → ℝ 𝑒×𝑒 𝑙: 𝒴 × 𝒴 → ℝ
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE A connection Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ 𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection : Define 𝑦 ′ , 𝑘 = 𝒍 𝑦 ′ , 𝑦 ′ 𝑗,𝑘 𝑙( 𝑦, 𝑗 ,
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Vector/scalar valued kernel matrices 𝑙 11 𝑦 1 , 𝑦 1 𝑙 12 𝑦 1 , 𝑦 1 𝑙 11 𝑦 1 , 𝑦 𝑜 𝑙 12 𝑦 1 , 𝑦 𝑜 … 𝑙 21 𝑦 1 , 𝑦 1 𝑙 22 𝑦 1 , 𝑦 1 𝑙 21 𝑦 1 , 𝑦 𝑜 𝑙 22 𝑦 1 , 𝑦 𝑜 𝑳 = ⋮ ⋮ 𝑙 11 𝑦 𝑜 , 𝑦 1 𝑙 12 𝑦 𝑜 , 𝑦 1 𝑙 11 𝑦 𝑜 , 𝑦 𝑜 𝑙 12 𝑦 𝑜 , 𝑦 𝑜 … 𝑙 21 𝑦 𝑜 , 𝑦 1 𝑙 22 𝑦 𝑜 , 𝑦 1 𝑙 21 𝑦 𝑜 , 𝑦 𝑜 𝑙 22 𝑦 𝑜 , 𝑦 𝑜 𝑙 (𝑦 1 , 1), (𝑦 1 , 1) 𝑙 (𝑦 1 , 1), (𝑦 1 , 2) 𝑙 (𝑦 1 , 1), (𝑦 𝑜 , 1) 𝑙 (𝑦 1 , 1), (𝑦 𝑜 , 2) … 𝑙 𝑦 1 , 2 , (𝑦 1 , 1) 𝑙 𝑦 1 , 2 , (𝑦 1 , 2) 𝑙 𝑦 1 , 2 , (𝑦 𝑜 , 1) 𝑙 𝑦 1 , 2 , (𝑦 𝑜 , 2) 𝐿 = ⋮ ⋮ 𝑙 (𝑦 𝑜 , 1), (𝑦 1 , 1) 𝑙 (𝑦 𝑜 , 1), (𝑦 1 , 2) 𝑙 (𝑦 𝑜 , 1), (𝑦 𝑜 , 1) 𝑙 (𝑦 𝑜 , 1), (𝑦 𝑜 , 2) … 𝑙 𝑦 𝑜 , 2 , (𝑦 1 , 1) 𝑙 𝑦 𝑜 , 2 , (𝑦 1 , 2) 𝑙 𝑦 𝑜 , 2 , (𝑦 𝑜 , 1) 𝑙 𝑦 𝑜 , 2 , (𝑦 𝑜 , 2) 8
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE A connection Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ 𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection : Define 𝑦 ′ , 𝑘 = 𝒍 𝑦 ′ , 𝑦 ′ 𝑗,𝑘 𝑙( 𝑦, 𝑗 , All the theory developed for the scalar-valued GPs holds also for vector-valued GPs!
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE The sampling space
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE The space of samples Sampling from 𝐻𝑄 𝜈, 𝑙 is done using the corresponding normal distribution 𝑂( Ԧ 𝜈, K) Algorithm (slightly inefficient) 1. Do an SVD: K = 𝑉𝐸 2 𝑉 𝑈 2. Draw a normal vector 𝛽 ∼ 𝑂 0, 𝐽 𝑜×𝑜 3. Compute Ԧ 𝜈 + 𝑉𝐸𝛽 11
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE The space of samples • From K = 𝑉𝐸 2 𝑉 𝑈 (using that 𝑉 𝑈 𝑉 = 𝐽) we have that K𝑉𝐸 −1 = 𝑉𝐸 • A sample 𝜈 + K𝑉𝐸 −1 𝛽 𝑡 = Ԧ 𝜈 + 𝑉𝐸𝛽 = Ԧ corresponds to linear combinations of the columns of K . • K is symmetric → rows/columns can be used interchangeably 12
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Example: Squared exponential 𝑙 𝑦, 𝑦 ′ = exp − 𝑦 − 𝑦 ′ 2 𝜏 2 σ = 1 13
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Example: Squared exponential 𝑙 𝑦, 𝑦 ′ = exp − 𝑦 − 𝑦 ′ 2 𝜏 2 σ = 3 14
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Multi-scale signals 2 2 𝑦 ′ 𝑦 ′ • k x, x ′ = exp − 𝑦 − + 0.1 exp − 𝑦 − 1 0.1 15
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Periodic kernels cos 𝑦 • Define 𝑣 𝑦 = sin(𝑦) ‖𝑦 −𝑦 ′ ‖ • 𝑙 𝑦, 𝑦 ′ = exp(−‖(𝑣 𝑦 − 𝑣 𝑦 ′ ‖ 2 = exp(−4 sin 2 ) 𝜏 2 16
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Symmetric kernels • Enforce that f(x) = f(-x) • 𝑙 𝑦, 𝑦 ′ = 𝑙 −𝑦, 𝑦 ′ + 𝑙(𝑦, 𝑦 ′ ) 17
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Changepoint kernels • 𝑙 𝑦, 𝑦 ′ = 𝑡 𝑦 𝑙 1 𝑦, 𝑦 ′ 𝑡 𝑦 ′ + (1 − 𝑡 𝑦 )𝑙 2 (𝑦, 𝑦 ′ )(1 − 𝑡 𝑦 ′ ) 1 • s 𝑦 = 1+exp( −𝑦) 18
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Combining existing functions 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔 𝑦 ′ f x = x 19
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Combining existing functions 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔 𝑦 ′ f x = sin(x) 20
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Combining existing functions 𝑙 𝑦, 𝑦 ′ = 𝑗 (𝑦 ′ ) 𝑔 𝑗 𝑦 𝑔 𝑗 {f 1 x = x, f 2 x = sin(x)} 21
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Reproducing Kernel Hilbert Space • Define the space of functions 𝑂 𝐼 = {𝑔|𝑔 𝑦 = 𝛽 𝑗 𝑙 𝑦, 𝑦 𝑗 , 𝑜 ∈ ℕ, 𝑦 𝑗 ∈ 𝑌, 𝛽 𝑗 ∈ ℝ} 𝑗=1 ′ 𝑙(𝑦 𝑘 , 𝑦) we define the For 𝑔 𝑦 = σ 𝑗 𝛽 𝑗 𝑙 𝑦 𝑗 , 𝑦 and 𝑦 = σ 𝑘 𝛽 𝑘 inner product ′ 𝑙(𝑦 𝑗 , 𝑦 𝑘 ) 𝑔, 𝑙 = 𝛽 𝑗 𝛽 𝑘 𝑗,𝑘 The space H called a Reproducing Kernel Hilbert Space (RKHS).
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Two differnet basis for the RKHS 𝑙 𝑦, 𝑦 ′ = exp − 𝑦 − 𝑦 ′ 2 9 • Kernel basis • Eigenbasis (KL-Basis)
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Gaussian process regression
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Gaussian process regression • Given : Observations: {(𝑦 1 , 𝑧 1 ), … , 𝑦 𝑜 , 𝑧 𝑜 } • Goal: compute p( 𝑧 ∗ |𝑦 ∗ , 𝑦 1 , … , 𝑦 𝑜 , 𝑧 1 , … , 𝑧 𝑜 ) 𝑧 ∗ 𝑦 𝑜 𝑦 1 𝑦 2 𝑦 ∗ 25
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Gaussian process regression • Solution given by posterior process 𝐻𝑄 𝜈 𝑞 , 𝑙 𝑞 with 𝜈 𝑞 (𝑦 ∗ ) = 𝐿 𝑦 ∗ , 𝑌 𝐿 𝑌, 𝑌 + 𝜏 2 𝐽 −1 𝑧 − 𝐿 𝑦 ∗ , 𝑌 𝐿 𝑌, 𝑌 + 𝜏 2 𝐽 −1 𝐿 𝑌, 𝑦 ∗ ′ 𝑙 𝑞 𝑦 ∗ , 𝑦 ∗ ′ = 𝑙 𝑦 ∗ , 𝑦 ∗ ′ • We can sample from the posterior. 26
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples 27
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Gaussian kernel ( 𝜏 = 1) 28
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Gaussian kernel ( 𝜏 = 5) 29
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Periodic kernel 30
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Changepoint kernel 31
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Symmetric kernel 32
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Linear kernel 33
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Observations about the solution − 𝐿 𝑦 ∗ , 𝑌 𝐿 𝑌, 𝑌 + 𝜏 2 𝐽 −1 𝐿 𝑌, 𝑦 ∗ ′ 𝑙 𝑞 𝑦 ∗ , 𝑦 ∗ ′ = 𝑙 𝑦 ∗ , 𝑦 ∗ ′ • The covariance is independent of the value at the training points 38
> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Kernels and associated structures 39
Recommend
More recommend