learning with kernels and svm
play

Learning with kernels and SVM malova chata, 23. kv etna, 2006 - PowerPoint PPT Presentation

Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning with kernels and SVM malova chata, 23. kv etna, 2006 Petra Kudov malka, 23. 5. 2006 Introduction Binary classification


  1. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning with kernels and SVM Šámalova chata, 23. kvˇ etna, 2006 Petra Kudová Šámalka, 23. 5. 2006

  2. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Šámalka, 23. 5. 2006

  3. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find a general rule that explains data given only as a sample of limited size data may contain measurement errors or noise supervised learning data are sample of input-output pairs find input-output mapping prediction, classification, function approximation, etc. unsupervised learning data are sample of objects find some structure clustering, etc. Šámalka, 23. 5. 2006

  4. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning methods wide range of methods available statistical approaches neural networks originally biological motivation Multi-layer perceptrons, RBF networks Kohonen maps kernel methods modern and popular SVM Šámalka, 23. 5. 2006

  5. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Trends in machine learning Articles on machine learning found by Google Source: http://yaroslavvb.blogspot.com/ Šámalka, 23. 5. 2006

  6. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Trends in machine learning Articles on neural networks found by Google Source: http://yaroslavvb.blogspot.com/ Šámalka, 23. 5. 2006

  7. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Trends in machine learning Articles on suport vector machine found by Google Source: http://yaroslavvb.blogspot.com/ Šámalka, 23. 5. 2006

  8. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Binary classification Training set { ( x i , y i ) } m i = 1 x i ∈ X y i ∈ {− 1 , 1 } find classifier generalization Šámalka, 23. 5. 2006

  9. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier Suppose: X ⊂ R n , classes linearly separable 1 c + = � { i | y i =+ 1 } x i m + 1 � c − = { i | y i = − 1 } x i m − c = 1 2 ( c + + c − ) y = sgn ( � ( x − c ) , w � ) = sgn ( � ( x − ( c + + c − ) / 2 ) , (( c + + c − ) � ) = sgn ( � x , c + � − � x , c − � + b ) b = 1 2 ( || c − || 2 − || c + || 2 ) Šámalka, 23. 5. 2006

  10. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Mapping to the feature space life is not so easy, not all problems are linearly separable what to do if X is not dot-product space? choose a mapping to some (high dimensional) dot-product space - feature space Φ : X → H Šámalka, 23. 5. 2006

  11. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Mercer’s condition and Kernels If a symmetric function K ( x , y ) satisfies M a i a j K ( x i , x j ) ≥ 0 � i , j = 1 for all M ∈ N , x i , and a i ∈ R , there exists a mapping function Φ that maps x into the dot-product feature space and K ( x , y ) = � Φ( x ) , Φ( y ) � and vice versa. Function K is called kernel. Šámalka, 23. 5. 2006

  12. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Examples of kernels Linear Kernels K ( x , y ) = � x , y � Polynomial Kernels K ( x , y ) = ( � x , y � + 1 ) d for d = 2 and 2-dimensional inputs K ( x , y ) 1 + 2 x 1 y 1 + 2 x 2 y 2 + 2 x 1 y 1 x 2 y 2 + x 2 1 y 2 1 + x 2 2 y 2 = 2 = � Φ( x ) , Φ( x ) � √ √ √ 2 ) T 2 x 1 , 2 x 2 , 2 x 1 x 2 , x 2 1 x 2 ( 1 , Φ( x ) = Šámalka, 23. 5. 2006

  13. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Examples of kernels RBF Kernels K ( x , y ) = exp ( −|| x − y || 2 ) d 2 Other kernels kernels on various objects, such as graphs, strings, texts, etc. enable us to use dot-product algorithms measure of similarity Šámalka, 23. 5. 2006

  14. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X ⊂ R n , classes linearly separable 1 c + = � { i | y i =+ 1 } x i m + 1 � c − = { i | y i = − 1 } x i m − c = 1 2 ( c + + c − ) y = sgn ( � ( x − c ) , w � ) = sgn ( � ( x − ( c + + c − ) / 2 ) , (( c + + c − ) � ) = sgn ( � x , c + � − � x , c − � + b ) b = 1 2 ( || c − || 2 − || c + || 2 ) Šámalka, 23. 5. 2006

  15. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X is any set, Φ : X → H corresponding to kernel K 1 c + = � { i | y i =+ 1 } x i m + 1 � c − = { i | y i = − 1 } x i m − c = 1 2 ( c + + c − ) y = sgn ( � ( x − c ) , w � ) = sgn ( � ( x − ( c + + c − ) / 2 ) , (( c + + c − ) � ) = sgn ( � x , c + � − � x , c − � + b ) b = 1 2 ( || c − || 2 − || c + || 2 ) Šámalka, 23. 5. 2006

  16. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X is any set, Φ : X → H corresponding to kernel K y = sgn ( 1 K ( x , x i ) − 1 K ( x , x i ) + b ) � � m + m − { i | y i =+ 1 } { i | y i = − 1 } b = 1 2 ( 1 K ( x i , x j ) − 1 K ( x i , x j )) � � m 2 m 2 − { i , j | y i = y j = − 1 } { i , j | y i = y j =+ 1 } + Statistical approach Bayes classifier - special case � K ( x , y ) d x = 1 b = 0 ∀ y ∈ X ; X Šámalka, 23. 5. 2006

  17. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Simple Classifier - kernel version Suppose: X is any set, Φ : X → H corresponding to kernel K y = sgn ( 1 K ( x , x i ) − 1 K ( x , x i ) + 0 ) � � m + m − { i | y i =+ 1 } { i | y i = − 1 } = p + ( x ) = p − ( x ) Parzen windows Statistical approach Bayes classifier - special case � K ( x , y ) d x = 1 b = 0 ∀ y ∈ X ; X Šámalka, 23. 5. 2006

  18. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Separating hyperplane classifier in a form y ( x ) = sgn ( � w , x � + b ) � for y i = 1 > 0 � w , x i � + b for y i = − 1 < 0 each hyperplane D ( x ) = � w , x � + b = c , − 1 < c < 1 is separating optimal separating hyperplane - the one with the maximal margin Šámalka, 23. 5. 2006

  19. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Separating hyperplane classifier in a form y ( x ) = sgn ( � w , x � + b ) � for y i = 1 ≥ 1 � w , x i � + b for y i = − 1 ≤ − 1 each hyperplane D ( x ) = � w , x � + b = c , − 1 < c < 1 is separating optimal separating hyperplane - the one with the maximal margin Šámalka, 23. 5. 2006

  20. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Separating hyperplane classifier in a form y ( x ) = sgn ( � w , x � + b ) � for y i = 1 ≥ 1 � w , x i � + b for y i = − 1 ≤ − 1 each hyperplane D ( x ) = � w , x � + b = c , − 1 < c < 1 is separating optimal separating hyperplane - the one with the maximal margin Šámalka, 23. 5. 2006

  21. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin Šámalka, 23. 5. 2006

  22. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin y ( x ) = sgn ( � w , x � + b ) where w and b are solution of Q ( w ) = 1 min Q ( w ) , 2 || w || 2 with respect to constraints y i ( � w , x i � + b ) ≥ 1 , for i = 1 , . . . , M quadratic programming problem linear separability → solution exists no local minima Šámalka, 23. 5. 2006

  23. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin constrained optimization problem 1 2 || w || 2 subject to y i ( � w , x i � + b ) ≥ 1 min w can be handled by introducing Lagrange multipliers α i ≥ 0 m L ( w , b , α ) = 1 2 || w || 2 − α i ( y i ( � w , x i � + b ) − 1 ) � i = 1 minimize with respect to w and b maximize with respect to α i Šámalka, 23. 5. 2006

  24. Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Classifier with maximal margin m L ( w , b , α ) = 1 2 || w || 2 − α i ( y i ( � w , x i � + b ) − 1 ) � i = 1 minimize with respect to w , b ; maximize with respect to α Karush-Kuhn-Tucker (KKT) conditions δ L ( w , b , α ) δ L ( w , b , α ) = 0 = 0 δ b δ w w = � m � m i = 1 α i y i x i i = 1 α i y i = 0 we get y i ( � w , x i � + b ) > 1 → α i = 0 x i irrelevant y i ( � w , x i � + b ) = 1 → α i � = 0 x i support vector Šámalka, 23. 5. 2006

Recommend


More recommend