Feature space Basics of reproducing kernel Hilbert spaces Kernel Ridge Regression Lecture 1: Introduction to RKHS MLSS Cadiz, 2016 Gatsby Unit, CSML, UCL May 12, 2016 Lecture 1: Introduction to RKHS
Feature space Basics of reproducing kernel Hilbert spaces Kernel Ridge Regression Kernels and feature space (1): XOR example 5 4 3 2 1 x 2 0 −1 −2 −3 −4 −5 −5 −4 −3 −2 −1 0 1 2 3 4 5 x 1 No linear classifier separates red from blue Map points to higher dimensional feature space : � � ∈ R 3 φ ( x ) = x 1 x 2 x 1 x 2 Lecture 1: Introduction to RKHS
Feature space Basics of reproducing kernel Hilbert spaces Kernel Ridge Regression Kernels and feature space (2): smoothing 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 −0.2 −0.2 −0.2 −0.4 −0.4 −0.4 −0.6 −0.6 −0.6 −0.8 −0.8 −0.8 −1 −1 −1 −0.5 0 0.5 1 1.5 −0.5 0 0.5 1 1.5 −0.5 0 0.5 1 1.5 Kernel methods can control smoothness and avoid overfitting/underfitting . Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Outline: reproducing kernel Hilbert space We will describe in order: 1 Hilbert space 2 Kernel (lots of examples: e.g. you can build kernels from simpler kernels) 3 Reproducing property Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Hilbert space Definition (Inner product) Let H be a vector space over R . A function �· , ·� H : H × H → R is an inner product on H if 1 Linear: � α 1 f 1 + α 2 f 2 , g � H = α 1 � f 1 , g � H + α 2 � f 2 , g � H 2 Symmetric: � f , g � H = � g , f � H 3 � f , f � H ≥ 0 and � f , f � H = 0 if and only if f = 0. Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Hilbert space Definition (Inner product) Let H be a vector space over R . A function �· , ·� H : H × H → R is an inner product on H if 1 Linear: � α 1 f 1 + α 2 f 2 , g � H = α 1 � f 1 , g � H + α 2 � f 2 , g � H 2 Symmetric: � f , g � H = � g , f � H 3 � f , f � H ≥ 0 and � f , f � H = 0 if and only if f = 0. � Norm induced by the inner product: � f � H := � f , f � H Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Hilbert space Definition (Inner product) Let H be a vector space over R . A function �· , ·� H : H × H → R is an inner product on H if 1 Linear: � α 1 f 1 + α 2 f 2 , g � H = α 1 � f 1 , g � H + α 2 � f 2 , g � H 2 Symmetric: � f , g � H = � g , f � H 3 � f , f � H ≥ 0 and � f , f � H = 0 if and only if f = 0. � Norm induced by the inner product: � f � H := � f , f � H Definition (Hilbert space) Inner product space containing Cauchy sequence limits. Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Kernel Definition Let X be a non-empty set. A function k : X × X → R is a kernel if there exists an R -Hilbert space and a map φ : X → H such that ∀ x , x ′ ∈ X , � � k ( x , x ′ ) := φ ( x ) , φ ( x ′ ) H . Almost no conditions on X (eg, X itself doesn’t need an inner product, eg. documents). A single kernel can correspond to several possible features. A trivial example for X := R : � x / √ � 2 √ φ 1 ( x ) = x and φ 2 ( x ) = x / 2 Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space New kernels from old: sums, transformations Theorem (Sums of kernels are kernels) Given α > 0 and k, k 1 and k 2 all kernels on X , then α k and k 1 + k 2 are kernels on X . (Proof via positive definiteness: later!) A difference of kernels may not be a kernel ( why? ) Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space New kernels from old: sums, transformations Theorem (Sums of kernels are kernels) Given α > 0 and k, k 1 and k 2 all kernels on X , then α k and k 1 + k 2 are kernels on X . (Proof via positive definiteness: later!) A difference of kernels may not be a kernel ( why? ) Theorem (Mappings between spaces) Let X and � X be sets, and define a map A : X → � X . Define the kernel k on � X . Then the kernel k ( A ( x ) , A ( x ′ )) is a kernel on X . Example: k ( x , x ′ ) = x 2 ( x ′ ) 2 . Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space New kernels from old: products Theorem (Products of kernels are kernels) Given k 1 on X 1 and k 2 on X 2 , then k 1 × k 2 is a kernel on X 1 × X 2 . If X 1 = X 2 = X , then k := k 1 × k 2 is a kernel on X . Proof: Main idea only! H 1 space of kernels between shapes , � 1 � I � � � φ 1 ( x ) = φ 1 ( � ) = , k 1 ( � , △ ) = 0 . 0 I △ H 2 space of kernels between colors , � 0 � I • � � φ 2 ( x ) = φ 2 ( • ) = k 2 ( • , • ) = 1 . I • 1 Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space New kernels from old: products “Natural” feature space for colored shapes : � I � � � I • � � � I △ = φ 2 ( x ) φ ⊤ Φ( x ) = = 1 ( x ) I � I △ I � I △ I • Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space New kernels from old: products “Natural” feature space for colored shapes : � I � � � I • � � � I △ = φ 2 ( x ) φ ⊤ Φ( x ) = = 1 ( x ) I � I △ I � I △ I • Kernel is: � � k ( x , x ′ ) = Φ ij ( x )Φ ij ( x ′ ) = tr φ 1 ( x ) φ ⊤ 2 ( x ) φ 2 ( x ′ ) φ ⊤ 1 ( x ′ ) � �� � i ∈{• , •} j ∈{ � , △} k 2 ( x , x ′ ) φ ⊤ 1 ( x ′ ) φ 1 ( x ) k 2 ( x , x ′ ) = k 1 ( x , x ′ ) k 2 ( x , x ′ ) = tr � �� � k 1 ( x , x ′ ) Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Sums and products = ⇒ polynomials Theorem (Polynomial kernels) Let x , x ′ ∈ R d for d ≥ 1 , and let m ≥ 1 be an integer and c ≥ 0 be a positive real. Then �� x , x ′ � � m k ( x , x ′ ) := + c is a valid kernel. To prove : expand into a sum (with non-negative scalars) of kernels � x , x ′ � raised to integer powers. These individual terms are valid kernels by the product rule. Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Infinite sequences The kernels we’ve seen so far are dot products between finitely many features. E.g. � � ⊤ � � x 3 y 3 k ( x , y ) = sin ( x ) log x sin ( y ) log y � � x 3 where φ ( x ) = sin ( x ) log x Can a kernel be a dot product between infinitely many features? Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Infinite sequences Definition The space ℓ 2 ( square summable sequences) comprises all sequences a := ( a i ) i ≥ 1 for which ∞ � � a � 2 a 2 ℓ 2 = i < ∞ . i = 1 Lecture 1: Introduction to RKHS
What is a kernel? Feature space Constructing new kernels Basics of reproducing kernel Hilbert spaces Positive definite functions Kernel Ridge Regression Reproducing kernel Hilbert space Infinite sequences Definition The space ℓ 2 ( square summable sequences) comprises all sequences a := ( a i ) i ≥ 1 for which ∞ � � a � 2 a 2 ℓ 2 = i < ∞ . i = 1 Definition Given sequence of functions ( φ i ( x )) i ≥ 1 in ℓ 2 where φ i : X → R is the i th coordinate of φ ( x ) . Then ∞ � k ( x , x ′ ) := φ i ( x ) φ i ( x ′ ) (1) i = 1 Lecture 1: Introduction to RKHS
Recommend
More recommend