Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban - PowerPoint PPT Presentation

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1

Agenda Agenda  Motivations  Kernel Definition  Mercer’s Theorem  Kernel Matrix  Kernel Construction Sharif University of Technology, Computer Engineering Department, Machine Learning Course 2

Mot Motiv ivati ations ons  Learning linear classifiers can be done effectively (SVM, Perceptron, …).  How to generalize existing efficient linear classifiers to non-linear ones.  It may be hard to classify data points in the original feature space.  Use an appropriate high dimensional non-linear map to change the feature space. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 3

Kernel ernel Def Defini initi tion on  Consider data x lying in R n .  Use a high dimensional mapping Ф : R n  R N , with N>n.  Define the kernel function K(x,x’)=Ф(x) T Ф(x’).  That is the kernel function is the dot product in the new feature space.  Dot product measures the similarity of two data points.  K(x,x’) shows the similarity of x and x’. It is efficient to use K instead of Ф if the dimensionality of Ф is high (Why?).  Sharif University of Technology, Computer Engineering Department, Machine Learning Course 4

Kernel ernel Def Defini initi tion on  A simple example: Consider x = (x 1 , x 2 ) lies in 2 dimensional plane and Ф : R 2  R 3 with the following definition     2 2 ( x x , ) ( z z z , , ) ( x , 2 x x x , ) 1 2 1 2 3 1 1 2 2  A linear classifier in new space will become (w’ is a vector in new space):  T   T    2   2  g x ( ) w ' x ' w ' w ' ( ) x w ' w x ' 2 w x x ' w x ' w ' 0 0 1 1 2 1 2 3 2 0  What will be the shape of separating curve in the original space? 2   2   w x ' 2 w ' x x w ' x w ' 0 1 1 2 1 2 3 2 0 Sharif University of Technology, Computer Engineering Department, Machine Learning Course 5

Kernel ernel Def Defini initi tion on  What will be the kernel function in the previous example? T     2 2 u v 1 1       T       K u v ( , ) ( ) u ( ) v 2 u u 2 v v 1 2 1 2         2 2 u v     2 2        2 2    u v 2 u v u v u v 1 1 1 1 2 2 2 2     2 2    T u v u v u v 1 1 2 2 The dot product in the new space is squared of the dot product in the original space.  Can we construct an arbitrary conic section in original feature space? Why? u v  T 2 ( 1) We instead use Sharif University of Technology, Computer Engineering Department, Machine Learning Course 6

Kernel ernel Def Defini initi tion on  Some typical kernels include :  Linear  K u v ( , ) ( u v T )   d  Polynomial:  T   K u v ( , ) u v c , c 0        T K u v ( , ) tanh u v Sigmoid:   2      K u v ( , ) exp u v /2 2 Gaussian RBF:  Can any function K(u,v) be a valid kernel function?  That is, does there exist a function Ф with K(u,v) = Ф(u) T Ф(v)?  In the case of Mercer’s condition, it is a valid kernel function. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 7

Mercer’s Theorem  If for any squared integrable function f(.), we have     K x x f x f x dxdx ( , ) ( ) ( ) 0 2 n R then the function K(x, x’) is a valid kernel function.  In this case the components of the corresponding function Ф are proportional to the eigenfunctions of K(x, x’), that is     ( ) x  1 1      ( ) x        ( ) x K u v ( , ) ( ) v dv ( ) u 2 2   i i i   n R     In fact Mercer’s theorem checks that if K(x, y) is positive semi -definite and hence all 𝝁 i ≥ 0. . Sharif University of Technology, Computer Engineering Department, Machine Learning Course 8

Kernel ernel Mat Matri rix  Restricting the kernel function to a set of points {x 1 , …, x k }, the kernel function can be represented with a matrix :  K x x ( , ) K x x ( , ) K x x ( , )  1 1 1 2 1 k   K x ( , x )     2 1 K     K x ( , x ) K x ( , x )    k 1 k k  A matrix K is a valid kernel matrix if it is a positive semi-definite matrix,  That is, all its eigenvalues are greater or equal to zero.  The eigenvectors multiplied by squared roots of eigenvalues will be the restrictions of φ i to the set {x 1 , …, x k }. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 9

Polynomial Polynomial Kernel ernel  2nd degree polynomial:   2   2  T   K u v ( , ) u v u v u v 1 1 2 2 T     2 2 u v 1 1       2 u u   2 v v  1 2 1 2         2 2 u v     2 2  Up to 2nd degree polynomial:   2   2      T K u v ( , ) u v 1 u v u v 1 1 1 2 2  Can construct any 2nd order function in T         u 2 v 2 original feature space     1 1     2 u u 2 v v  1 2   1 2       2 u 2 v 1 1      2 u   2 v  2 2     2 2 u v     2 2     1 1     Sharif University of Technology, Computer Engineering Department, Machine Learning Course 10

RB RBF F Kernel ernel  An example  That is the input space -5<u<5 will be mapped to a curve using only 2 dimensions of Ф . 𝝌 2 ( 𝝌 1 , 𝝌 2 ) u 𝝌 1 Sharif University of Technology, Computer Engineering Department, Machine Learning Course 11

RB RBF F Kernel ernel  An example (cont.)   2     2 K u v ( , ) exp u v /2  Consider the Gaussian kernel :  Where u lies in a subset of R, -5<u<5.  The eigenfunctions of K are illustrated. Ф = ( 𝝌 1 , …, 𝝌 10, …). Sharif University of Technology, Computer Engineering Department, Machine Learning Course 12

RB RBF F Kernel ernel  An example (cont.)  Consider a linear classifier in the new space.  The corresponding classifier in the u space is clearly non-linear in the original space. 𝝌 2 ( 𝝌 1 , 𝝌 2 ) C 1 u C 2 C 2 C 2 C 1 𝝌 1 Sharif University of Technology, Computer Engineering Department, Machine Learning Course 13

RB RBF F Kernel ernel  RBF kernel considers a Gaussian around each data point.  Linear discriminant function cuts through the surface in embedding function.  Therefore any arbitrary set of points can be classified by RBF kernels.  Training error is zero when σ  0. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 14

Kernel ernel Const Constructi ruction on  How to build valid kernels from existing kernels?  According to Mercer’s theorem if c > 0 and k 1 , k 2 are valid kernels, and ψ is an arbitrary function, then following functions will also be valid kernels:  K(u,v) = ck 1 (u,v)  K(u,v) = k 1 (u,v) + k 2 (u,v)  K(u,v) = k 1 (u,v) k 2 (u,v)  K(u,v) = k 1 ( ψ (u), ψ (v)) Sharif University of Technology, Computer Engineering Department, Machine Learning Course 15

Kernel ernel Const Constructi ruction on  Construct kernels from probabilistic generative models (class conditional probabilities, HMM, …) and then use the kernel in a discriminative model (such as SVM or linear discriminant functions, …).  K(x,x’) = p(x)p(x’) is clearly a valid kernel, which states that x and x’ are similar if they both have high probability (Why it is valid?).  A better kernel can be constructed in the same way : n   K u v ( , ) p u c p v c p c ( | ) ( | ) ( ) i i i  i 1  That is u and v are similar if they have high probabilities under same classes. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 16

Kernel ernel Const Constructi ruction on  State of the arts methods tries to learn the kernel from (probably many) training points.  The simplest one is the multiple kernel learning.  Consider {k 1 , …, k n } as n valid kernels. n    K u v ( , ) c k u v ( , ), c 0  Find an appropriate kernel, k(u,v), from the training data i i i  i 1  Minimize training loss (MSE) by changing c i and simultaneously minimize trace of the kernel matrix on training data to avoid overfitting.  Many variations of the algorithm are developed. Sharif University of Technology, Computer Engineering Department, Machine Learning Course 17

Exa Exampl mple e 1 Solution Sharif University of Technology, Computer Engineering Department, Machine Learning Course 18

Exa Exampl mple e 2 Solution Sharif University of Technology, Computer Engineering Department, Machine Learning Course 19

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban - PowerPoint PPT Presentation

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 Agenda Agenda Motivations Kernel Definition Mercers Theorem Kernel Matrix Kernel Construction

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Parametric Curves and Surfaces Graphics & Visualization: Principles & Algorithms

Results from the SDSS-II Supernova Survey R.Kessler University of Chicago Sep 14, 2009

Algorithms for embedded graphs Sergio Cabello University of Ljubljana Slovenia (based on work

Seiberg-Witten prepotential from Liouville classical conformal block

Multilateral Bargaining November 20, 2012 Multilateral Bargaining A group of individuals

Consensus Pyramids F.R. McMorris Illinois Institute of Technology Chicago, IL 60616

Aggregating Alternative Extensions of AAFs: Preservation Results for Quota Rules Weiwei Chen Sun

8.1 Review In the previous lecture we began looking at algorithms for dealing with sequential

Sambuz

Useful Links

Newsletter

Mail Us

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban - PowerPoint PPT Presentation

Machine Learning Kernel Methods Hamid R. Rabiee Mohammad H. Rohban Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 Agenda Agenda Motivations Kernel Definition Mercers Theorem Kernel Matrix Kernel Construction

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Parametric Curves and Surfaces Graphics &amp; Visualization: Principles &amp; Algorithms

Results from the SDSS-II Supernova Survey R.Kessler University of Chicago Sep 14, 2009

Algorithms for embedded graphs Sergio Cabello University of Ljubljana Slovenia (based on work

Seiberg-Witten prepotential from Liouville classical conformal block

Multilateral Bargaining November 20, 2012 Multilateral Bargaining A group of individuals

Consensus Pyramids F.R. McMorris Illinois Institute of Technology Chicago, IL 60616

Aggregating Alternative Extensions of AAFs: Preservation Results for Quota Rules Weiwei Chen Sun

8.1 Review In the previous lecture we began looking at algorithms for dealing with sequential

Sambuz

Useful Links

Newsletter

Mail Us

Parametric Curves and Surfaces Graphics & Visualization: Principles & Algorithms