More on kernels Marcel Lthi Graphics and Vision Research Group - PowerPoint PPT Presentation

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE More on kernels Marcel Lüthi Graphics and Vision Research Group Department of Mathematics and Computer Science University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Kernels everywhere Integral and differential equations • Aronszajn, Nachman. "Theory of reproducing kernels." Transactions of the American mathematical society (1950): 337-404. Numerical analysis, Approximation and Interpolation theory • Wahba, Grace. Spline models for observational data . Vol. 59. Siam, 1990. • Schaback, Robert, and Holger Wendland. "Kernel techniques: From machine learning to meshless methods." Acta Numerica 15 (2006): 543-639. • Hennig, Philipp, and Osborn, Michael: Probabilistic numerics • Geostatistics (Gaussian processes) • Stein, Michael L. Interpolation of spatial data: some theory for kriging . Springer Science & Business Media, 1999. 2

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Kernels everywhere • Learning Theory / Machine learning • Vapnik, Vladimir. Statistical learning theory . Vol. 1. New York: Wiley, 1998. • Hofmann, Thomas, Bernhard Schölkopf, and Alexander J. Smola. "Kernel methods in machine learning." The annals of statistics (2008): 1171-1220. • Shape modelling / Image analysis • Grenander, Ulf, and Michael I. Miller. "Computational anatomy: An emerging discipline." Quarterly of applied mathematics 56.4 (1998): 617-694. • Younes, Laurent: Shapes and diffeomorphisms, Springer 2010 3

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE What do they have in common? • Solution space has a rich structure ML to be able to: • Predict unseen values • Deal with noisy or incomplete data Image analysis Statistics • Capture a pattern • Kernels ideally suited to define Differential such structure Numerics equations • The resulting space of functions is mathematically “nice”. 4

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Back to basics: Scalar-valued GPs Vector-valued (this course) Scalar-valued (more common) • Samples u are deformation • Samples f are real-valued functions fields: 𝑣: 𝒴 → ℝ 𝑒 𝑔 ∶ 𝒴 → ℝ

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Scalar-valued Gaussian processes Vector-valued (this course) Scalar-valued (more common) 𝑣 ∼ 𝐻𝑄 Ԧ 𝜈, 𝒍 𝑔 ∼ 𝐻𝑄 𝜈, 𝑙 𝜈: 𝒴 → ℝ 𝑒 Ԧ 𝜈: 𝒴 → ℝ 𝒍: 𝒴 × 𝒴 → ℝ 𝑒×𝑒 𝑙: 𝒴 × 𝒴 → ℝ

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE A connection Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ 𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection : Define 𝑦 ′ , 𝑘 = 𝒍 𝑦 ′ , 𝑦 ′ 𝑗,𝑘 𝑙( 𝑦, 𝑗 ,

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Vector/scalar valued kernel matrices 𝑙 11 𝑦 1 , 𝑦 1 𝑙 12 𝑦 1 , 𝑦 1 𝑙 11 𝑦 1 , 𝑦 𝑜 𝑙 12 𝑦 1 , 𝑦 𝑜 … 𝑙 21 𝑦 1 , 𝑦 1 𝑙 22 𝑦 1 , 𝑦 1 𝑙 21 𝑦 1 , 𝑦 𝑜 𝑙 22 𝑦 1 , 𝑦 𝑜 𝑳 = ⋮ ⋮ 𝑙 11 𝑦 𝑜 , 𝑦 1 𝑙 12 𝑦 𝑜 , 𝑦 1 𝑙 11 𝑦 𝑜 , 𝑦 𝑜 𝑙 12 𝑦 𝑜 , 𝑦 𝑜 … 𝑙 21 𝑦 𝑜 , 𝑦 1 𝑙 22 𝑦 𝑜 , 𝑦 1 𝑙 21 𝑦 𝑜 , 𝑦 𝑜 𝑙 22 𝑦 𝑜 , 𝑦 𝑜 𝑙 (𝑦 1 , 1), (𝑦 1 , 1) 𝑙 (𝑦 1 , 1), (𝑦 1 , 2) 𝑙 (𝑦 1 , 1), (𝑦 𝑜 , 1) 𝑙 (𝑦 1 , 1), (𝑦 𝑜 , 2) … 𝑙 𝑦 1 , 2 , (𝑦 1 , 1) 𝑙 𝑦 1 , 2 , (𝑦 1 , 2) 𝑙 𝑦 1 , 2 , (𝑦 𝑜 , 1) 𝑙 𝑦 1 , 2 , (𝑦 𝑜 , 2) 𝐿 = ⋮ ⋮ 𝑙 (𝑦 𝑜 , 1), (𝑦 1 , 1) 𝑙 (𝑦 𝑜 , 1), (𝑦 1 , 2) 𝑙 (𝑦 𝑜 , 1), (𝑦 𝑜 , 1) 𝑙 (𝑦 𝑜 , 1), (𝑦 𝑜 , 2) … 𝑙 𝑦 𝑜 , 2 , (𝑦 1 , 1) 𝑙 𝑦 𝑜 , 2 , (𝑦 1 , 2) 𝑙 𝑦 𝑜 , 2 , (𝑦 𝑜 , 1) 𝑙 𝑦 𝑜 , 2 , (𝑦 𝑜 , 2) 8

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE A connection Matrix-valued kernels can be reinterpreted as scalar-valued kernels: Matrix valued kernel: 𝒍: 𝒴 × 𝒴 → ℝ 𝒆×𝒆 Scalar valued kernel: 𝑙: 𝒴 × 1. . 𝑒 × 𝒴 × 1. . 𝑒 → ℝ Bijection : Define 𝑦 ′ , 𝑘 = 𝒍 𝑦 ′ , 𝑦 ′ 𝑗,𝑘 𝑙( 𝑦, 𝑗 , All the theory developed for the scalar-valued GPs holds also for vector-valued GPs!

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE The sampling space

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE The space of samples Sampling from 𝐻𝑄 𝜈, 𝑙 is done using the corresponding normal distribution 𝑂( Ԧ 𝜈, K) Algorithm (slightly inefficient) 1. Do an SVD: K = 𝑉𝐸 2 𝑉 𝑈 2. Draw a normal vector 𝛽 ∼ 𝑂 0, 𝐽 𝑜×𝑜 3. Compute Ԧ 𝜈 + 𝑉𝐸𝛽 11

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE The space of samples • From K = 𝑉𝐸 2 𝑉 𝑈 (using that 𝑉 𝑈 𝑉 = 𝐽) we have that K𝑉𝐸 −1 = 𝑉𝐸 • A sample 𝜈 + K𝑉𝐸 −1 𝛽 𝑡 = Ԧ 𝜈 + 𝑉𝐸𝛽 = Ԧ corresponds to linear combinations of the columns of K . • K is symmetric → rows/columns can be used interchangeably 12

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Example: Squared exponential 𝑙 𝑦, 𝑦 ′ = exp − 𝑦 − 𝑦 ′ 2 𝜏 2 σ = 1 13

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Example: Squared exponential 𝑙 𝑦, 𝑦 ′ = exp − 𝑦 − 𝑦 ′ 2 𝜏 2 σ = 3 14

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Multi-scale signals 2 2 𝑦 ′ 𝑦 ′ • k x, x ′ = exp − 𝑦 − + 0.1 exp − 𝑦 − 1 0.1 15

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Periodic kernels cos 𝑦 • Define 𝑣 𝑦 = sin(𝑦) ‖𝑦 −𝑦 ′ ‖ • 𝑙 𝑦, 𝑦 ′ = exp(−‖(𝑣 𝑦 − 𝑣 𝑦 ′ ‖ 2 = exp(−4 sin 2 ) 𝜏 2 16

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Symmetric kernels • Enforce that f(x) = f(-x) • 𝑙 𝑦, 𝑦 ′ = 𝑙 −𝑦, 𝑦 ′ + 𝑙(𝑦, 𝑦 ′ ) 17

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Changepoint kernels • 𝑙 𝑦, 𝑦 ′ = 𝑡 𝑦 𝑙 1 𝑦, 𝑦 ′ 𝑡 𝑦 ′ + (1 − 𝑡 𝑦 )𝑙 2 (𝑦, 𝑦 ′ )(1 − 𝑡 𝑦 ′ ) 1 • s 𝑦 = 1+exp( −𝑦) 18

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Combining existing functions 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔 𝑦 ′ f x = x 19

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Combining existing functions 𝑙 𝑦, 𝑦 ′ = 𝑔 𝑦 𝑔 𝑦 ′ f x = sin(x) 20

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Combining existing functions 𝑙 𝑦, 𝑦 ′ = ෍ 𝑗 (𝑦 ′ ) 𝑔 𝑗 𝑦 𝑔 𝑗 {f 1 x = x, f 2 x = sin(x)} 21

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Reproducing Kernel Hilbert Space • Define the space of functions 𝑂 𝐼 = {𝑔|𝑔 𝑦 = ෍ 𝛽 𝑗 𝑙 𝑦, 𝑦 𝑗 , 𝑜 ∈ ℕ, 𝑦 𝑗 ∈ 𝑌, 𝛽 𝑗 ∈ ℝ} 𝑗=1 ′ 𝑙(𝑦 𝑘 , 𝑦) we define the For 𝑔 𝑦 = σ 𝑗 𝛽 𝑗 𝑙 𝑦 𝑗 , 𝑦 and 𝑕 𝑦 = σ 𝑘 𝛽 𝑘 inner product ′ 𝑙(𝑦 𝑗 , 𝑦 𝑘 ) 𝑔, 𝑕 𝑙 = ෍ 𝛽 𝑗 𝛽 𝑘 𝑗,𝑘 The space H called a Reproducing Kernel Hilbert Space (RKHS).

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Two differnet basis for the RKHS 𝑙 𝑦, 𝑦 ′ = exp − 𝑦 − 𝑦 ′ 2 9 • Kernel basis • Eigenbasis (KL-Basis)

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Gaussian process regression

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Gaussian process regression • Given : Observations: {(𝑦 1 , 𝑧 1 ), … , 𝑦 𝑜 , 𝑧 𝑜 } • Goal: compute p( 𝑧 ∗ |𝑦 ∗ , 𝑦 1 , … , 𝑦 𝑜 , 𝑧 1 , … , 𝑧 𝑜 ) 𝑧 ∗ 𝑦 𝑜 𝑦 1 𝑦 2 𝑦 ∗ 25

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Gaussian process regression • Solution given by posterior process 𝐻𝑄 𝜈 𝑞 , 𝑙 𝑞 with 𝜈 𝑞 (𝑦 ∗ ) = 𝐿 𝑦 ∗ , 𝑌 𝐿 𝑌, 𝑌 + 𝜏 2 𝐽 −1 𝑧 − 𝐿 𝑦 ∗ , 𝑌 𝐿 𝑌, 𝑌 + 𝜏 2 𝐽 −1 𝐿 𝑌, 𝑦 ∗ ′ 𝑙 𝑞 𝑦 ∗ , 𝑦 ∗ ′ = 𝑙 𝑦 ∗ , 𝑦 ∗ ′ • We can sample from the posterior. 26

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples 27

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Gaussian kernel ( 𝜏 = 1) 28

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Gaussian kernel ( 𝜏 = 5) 29

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Periodic kernel 30

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Changepoint kernel 31

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Symmetric kernel 32

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Examples Linear kernel 33

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Observations about the solution − 𝐿 𝑦 ∗ , 𝑌 𝐿 𝑌, 𝑌 + 𝜏 2 𝐽 −1 𝐿 𝑌, 𝑦 ∗ ′ 𝑙 𝑞 𝑦 ∗ , 𝑦 ∗ ′ = 𝑙 𝑦 ∗ , 𝑦 ∗ ′ • The covariance is independent of the value at the training points 38

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE Kernels and associated structures 39

More on kernels Marcel Lthi Graphics and Vision Research Group - PowerPoint PPT Presentation

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE More on kernels Marcel Lthi Graphics and Vision Research Group Department of Mathematics and Computer Science University of Basel > DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

Scalable Machine Learning 6. Kernels Alex Smola Yahoo! Research and ANU

SVM Kernels COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning SVM Kernels 1 /

GRAPH MINING AND GRAPH KERNELS Part II: Graph Kernels Karsten Borgwardt^ and Xifeng Yan*

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

On enumerating the kernels in a bipolar valued digraph Raymond Bisdorff University of Luxembourg

Kernel on Automata Cousins of String Kernels and Dynamic Systems Kernels? S.V.N. Vishy

Launching Kernels Dr Eric McCreath Research School of Computer Science The Australian National

Modelling covariance kernels for nonstationary random fields Christopher G. Small University of

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Variably scaled kernels M. Bozzini jointed with L. Lenarduzzi, M. Rossini, R. Schaback Maia

PRELIMINARY RESULTS ON 3D GEOLOGIC MAP OF SLOVAKIA A Case Study on Using the Moving

Introduction to Geostatistics Abhi Datta 1 , Sudipto Banerjee 2 and Andrew O. Finley 3 July 31,

Augmenting Polygons with Matchings Alexander Pilz, Jonathan Rollin, Lena Schlipf, Andr e

From Fuzzification and Resulting Formalism: Idea K -Vectors Towards K -Covectors Intervalization

Course on Inverse Problems Albert Tarantola Lesson VIII: Monte Carlo Methods Monte Carlo

sr t rtt

Spherical Designs and Determinantal Point Processes Masatake HIRAO ( ) (Aichi

Latent Force Models Neil D. Lawrence (work with Magnus Rattray, Mauricio Alvarez , Pei Gao,