Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M. Magdon-Ismail CSCI 4100/6100
recap: The Kernel Allows Us to Bypass Z -space Solve the QP α ∗ 1 minimize 2 α t G α − 1 t α α index s : C > α ∗ s > 0 − → subject to: y t α = 0 ↑ C ≥ α ≥ 0 free support vectors x n ∈ X ↓ K ( · , · ) Overfitting Computation SVM Pseudo-inverse Inner products with Kernel K ( · , · ) high ˜ high ˜ d → complicated separator d → expensive or infeasible computation � g ( x ) = sign α ∗ n y n K ( x n , x ) + b ∗ kernel → computationally feasible to go to high ˜ few support vectors → low effective complexity d α ∗ n > 0 b ∗ = y s − � Can go to high (infinite) ˜ Can go to high (infinite) ˜ α ∗ n y n K ( x n , x s ) d d α ∗ n > 0 (One can compute b ∗ for several SVs and average) M Kernel Machines : 2 /12 � A c L Creator: Malik Magdon-Ismail Polynomial Kernel − →
Polynomial Kernel 2nd-Order Polynomial Kernel x 1 x 2 . . . x d K ( x , x ′ ) Φ( x ) t Φ( x ′ ) x 2 = 1 x 2 2 . . d d . 2 + 2 � � � x 2 ← O ( d 2 ) = x i x ′ i + i x ′ x i x j x ′ i x ′ x 2 Φ( x ) = i j √ d i =1 i =1 2 x 1 x 2 i<j √ 2 x 1 x 3 � 2 . � 1 − 1 . . = 2 + x t x ′ √ 4 2 x 1 x d √ 2 x 2 x 3 . ↑ . . √ computed quickly 2 x d − 1 x d in X -space, in O ( d ) Q -th order polynomial kernel K ( x , x ′ ) = ( r + x t x ′ ) Q ← inhomogeneous kernel K ( x , x ′ ) = ( x t x ′ ) Q ← homogeneous kernel M Kernel Machines : 3 /12 � A c L Creator: Malik Magdon-Ismail RBF-Kernel − →
RBF-Kernel One dimensional RBF-Kernel 1 K ( x , x ′ ) = Φ( x ) t Φ( x ′ ) � 2 1 1! x ∞ Hard Margin ( γ = 2000 , C = ∞ ) (2 xx ′ ) i e − x 2 e − x ′ 2 � = ← not feasible � 2 2 2! x 2 i ! Φ( x ) = e − x 2 i =0 � 2 3 3! x 3 e − x 2 e − x ′ 2 e 2 xx ′ = � 4! x 4 2 4 e − ( x − x ′ ) 2 = . . ↑ . computed quickly in X -space, in O ( d ) Soft Margin ( γ = 2000 , C = 0 . 25) d -dimensional RBF-Kernel | x − x ′ | | 2 K ( x , x ′ ) = e − γ | ( γ > 0) Soft Margin ( γ = 100 , C = 0 . 25) M Kernel Machines : 4 /12 � A c L Creator: Malik Magdon-Ismail RBF-Kernel Width − →
Choosing RBF-Kernel Width γ | x − x ′ | | 2 e − γ | Small γ Medium γ Large γ ! M Kernel Machines : 5 /12 � A c L Creator: Malik Magdon-Ismail RBF-Kernel simulates k -RBF-Network − →
RBF-Kernel Simulates k -RBF-Network RBF-Kernel k -RBF-Network k | 2 + b ∗ | 2 + w 0 � � α ∗ n y n e −| | x − x n | w j e −| | x − µ j | g ( x ) = sign g ( x ) = sign α ∗ n > 0 j =1 Centers are at support vectors Centers chosen to represent the data Number of centers auto-determined Number of centers k is an input M Kernel Machines : 6 /12 � A c L Creator: Malik Magdon-Ismail Neural Network Kernel − →
Neural Network Kernel K ( x , x ′ ) = tanh( κ · x t x ′ + c ) Neural Network Kernel 2 Layer Neural Network � m � � g ( x ) = sign w j tanh( v j t x ) + w 0 � α ∗ t x + c ) + b ∗ g ( x ) = sign n y n tanh( κ · x n j =1 n > 0 α ∗ First layer weights are support vectors First layer weights arbitrary Number of hidden nodes auto-determined Number of hidden nodes m is an input M Kernel Machines : 7 /12 � A c L Creator: Malik Magdon-Ismail Inner product measures similarity − →
The Inner Product Measures Similarity K ( x , x ′ ) = z t z ′ = | | z ′ | | z | | · | | · cos( θ z , z ′ ) | z ′ | | · CosSim ( z , z ′ ) = | | z | | · | Normalizing for size, Kernel measures similarity between input vectors M Kernel Machines : 8 /12 � A c L Creator: Malik Magdon-Ismail Designing Kernels − →
Designing Kernels • Construct a similarity measure for the data • A linear model should be plausible in that transformed space M Kernel Machines : 9 /12 � A c L Creator: Malik Magdon-Ismail String Kernels − →
String Kernels Applications: DNA sequences, Text Dear Sir, Dear Jane, With reference to your letter dated 26th I am terribly sorry to hear the news of March, I want to confirm the Order No. your hip fracture. I can only imagine 34-09-10 placed on 3rd March, 2010. I what a terrible time you must be going would appreciate if you could send me through. I hope you and the family are ACGGTGTCAAACGTGTCAGTGTG the account details where the payment has coping well. If there is any help you to be made. As per the invoice, we are need, don’t hesitate to let me know. entitled to a cash discount of 2%. Can you please let us know whether it suits GTCGGGTCAAAACGTGAT you if we make a wire transfer instead of a cheque? Similar? Yes, if classifying spam versus non-spam No, if classifying business versus personal To design the kernel − → measure similarity between strings Bag of words (number of occurences of each atom) Co-occurrence of substrings or subsequences M Kernel Machines : 10 /12 � A c L Creator: Malik Magdon-Ismail Graph Kernels − →
Graph Kernels Performing classification on: Graph structures (eg. protein networks for function prediction) Graph nodes within a network (eg. advertise of not to Facebook users) Similarity between graphs : random walks degree sequences, connectivity properties, mixing properties. Measuring similarity between nodes : Looking at neighborhoods, K ( v, v ′ ) = | N ( v ) ∩ N ( v ′ ) | | N ( v ) ∪ N ( v ′ ) | . M Kernel Machines : 11 /12 � A c L Creator: Malik Magdon-Ismail Image Kernels − →
Image Kernels Similar? Yes - if trying to regcognize pictures with faces. No - if trying to distinguish Malik from Christos M Kernel Machines : 12 /12 � A c L Creator: Malik Magdon-Ismail
Recommend
More recommend