CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 - PowerPoint PPT Presentation

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec. 14.1, 14.2 [HTF] Chap. 6 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Non-linear Models Recap • Generalized linear models: • Neural networks: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Kernel Methods • Idea: use large (possibly infinite) set of fixed non- linear basis functions • Normally, complexity depends on number of basis functions, but by a “dual trick”, complexity depends on the amount of data • Examples: – Gaussian Processes (next class) – Support Vector Machines (next week) – Kernel Perceptron – Kernel Principal Component Analysis University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Kernel Function • Let !(#) be a set of basis functions that map inputs % to a feature space. • In many algorithms, this feature space only appears in the dot product ! # & !(# ' ) of input pairs #, #′ . • Define the kernel function * #, # ' = ! # & !(# ' ) to be the dot product of any pair %, %′ in feature space. – We only need to know ,(#, # ' ) , not !(#) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

Dual Representations • Recall linear regression objective % + $ 0 ) " * + , ' − . ' % " * " % ∑ '($ ! " = • Solution: set gradient to 0 1! " = ∑ ' " * + , ' − . ' + , ' + 2" = 0 $ 0 ∑ ' " * + , 4 − . ' +(, 4 ) " = − ∴ " is a linear combination of inputs in feature space + , ' |1 ≤ ; ≤ < University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

Dual Representations • Substitute ! = #$ • Where # = [& ' ( & ' ) … & ' + ] - ( - ) ( 2 3 4 & ' 0 − 5 0 $ = and - 0 = − ⋮ - / • Dual objective: minimize 6 with respect to $ ) $ 7 # 7 ## 7 #$ − $ 7 # 7 #8 + 8 7 8 6 $ = ( ) + 2 ) $ 7 # 7 #$ University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

Gram Matrix • Let ! = # $ # be the Gram matrix • Substitute in objective: ( & ) !!& − & ) !+ + + ) + % & = ' ( + - ( & ) !& • Solution: set gradient to 0 .% & = !!& − !+ + /!& = 0 ! ! + /1 & = !+ & = ! + /1 2' + • Prediction: 3 ∗ = 5 6 ∗ $ 7 = 5 6 ∗ $ #& = 8 6 ∗ , : ! + /1 2' + where :, + is the training set and 6 ∗ , 3 ∗ is a test instance University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Dual Linear Regression • Prediction: ! ∗ = $ % ∗ & '( , + ./ 01 2 = ) % ∗ , + • Linear regression where we find dual solution ( instead of primal solution w . • Complexity: – Primal solution: depends on # of basis functions – Dual solution: depends on amount of data • Advantage: can use very large # of basis functions • Just need to know kernel ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Constructing Kernels • Two possibilities: – Find mapping ! to feature space and let " = ! $ ! – Directly specify " • Can any function that takes two arguments serve as a kernel? • No, a valid kernel must be positive semi-definite – In other words, % must factor into the product of a transposed matrix by itself (e.g., " = ! $ ! ) – Or, all eigenvalues must be greater than or equal to 0. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Example ' • Let ! ", $ = " & $ University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

Constructing Kernels • Can we construct ! directly without knowing " ? • Yes, any positive semi-definite ! is fine since there is a corresponding implicit feature space. But positive semi-definiteness is not always easy to verify. • Alternative, construct kernels from other kernels using rules that preserve positive semi-definiteness University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

Rules to construct Kernels • Let ! " #, # % and ! & (#, # % ) be valid kernels • The following kernels are also valid: ! #, # % = *! " #, # % 1. ∀* > 0 ! #, # % = . # ! " #, # % . # % 2. ∀. ! #, # % = /(! " #, # % ) / is polynomial with coeffs ≥ 0 3. ! #, # % = exp ! " #, # % 4. ! #, # % = ! " #, # % + ! & #, # % 5. ! #, # % = ! " #, # % ! & (#, # % ) 6. ! #, # % = ! 5 (6 # , 6 # % ) 7. ! #, # % = # 7 8# % 8 is symmetric positive semi-definite 8. % ) ! #, # % = ! 9 # : , # 9 % 9. + ! ; (# < , # ; 10. ! #, # % = ! 9 # 9 , # 9 % ! ; (# ; , # ; % ) # = where # = # > University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

Common Kernels • Polynomial kernel: ! ", " $ = " & " $ ' – ( is the degree – Feature space: all degree M products of entries in " – Example: Let " and "′ be two images, then feature space could be all products of M pixel intensities • More general polynomial kernel: ! ", " $ = " & " $ + + ' with + > 0 – Feature space: all products of up to M entries in " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Common Kernels , "*" + • Gaussian Kernel: ! ", " $ = exp − -. , • Valid Kernel because: • Implicit feature space is infinite! University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Non-vectorial Kernels • Kernels can be defined with respect to other things than vectors such as sets, strings or graphs • Example for strings: ! " # , " % = similarity between two documents (weighted sum of all non-contiguous strings that appear in both documents " # and " % ). • Lodhi, Saunders, Shawe-Taylor, Christianini, Watkins, Text Classification Using String Kernels , JMLR, p. 419-444, 2002. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 - PowerPoint PPT Presentation

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec. 14.1, 14.2 [HTF] Chap. 6 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Non-linear Models Recap Generalized linear models:

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

SVMs and Kernel Methods Lecture 3 David Sontag New York University Slides adapted from Luke

Efficient Structure-Aware Selection Techniques for 3D Point Cloud Visualizations with 2DOF Input

Panel f u nctions DATA VISU AL IZATION W ITH L ATTIC E IN R Deepa y an Sarkar Associate

Density Estimation Parametric techniques Maximum Likelihood Maximum A Posteriori

Scalable Learning in Reproducing Kernel Kre n Spaces Dino Oglic 1 Thomas Grtner 2 1

Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning

Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar Quick Motivation

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of