efficient multiple kernel learning
play

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel - PowerPoint PPT Presentation

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel Learning? Whats the problem with existing formulation? Two new formulations for large scale kernel selection selection SIL formulation (Cutting Planes)


  1. Efficient Multiple Kernel Learning Lei Tang

  2. Outline • What is Kernel Learning? • What’s the problem with existing formulation? • Two new formulations for large scale kernel selection selection – SIL formulation (Cutting Planes) – More efficient MKL (Steepest Decent)

  3. Linear algorithm: binary classification • Data: {(x i ,y i )} i=1...n – x R d = feature vector • HEART • HEART – y {-1,+1} = label • URINE • URINE • DNA • DNA • BLOOD • BLOOD • SCAN • SCAN • Question : design a classification rule y = f(x) such that, given a new x, this predicts y with minimal probability of error

  4. Linear algorithm: binary classification • Find good hyperplane (w,b) R d+1 that classifies this and future data • HEART • URINE • DNA • BLOOD points as good as possible • SCAN • HEART • URINE • DNA • BLOOD • SCAN � � ��������� Classification Rule: ����������������� � � ����

  5. Linear algorithm: binary classification • Intuition (Vapnik, 1965) if linearly separable: – Separate the data – – Place hyerplane “far” from Place hyerplane “far” from the data: large margin ���

  6. Linear algorithm: binary classification • Intuition (Vapnik, 1965) if linearly separable: – Separate the data – Place hyerplane “far” – Place hyerplane “far” from the data: large margin ���� � � � � Maximal Margin Classifier

  7. Linear algorithm: binary classification If not linearly separable : – Allow some errors – Still, try to place hyerplane “far” from each class

  8. SVM: Primal & Dual = ⋅ + f w x b + � 1 Primal: N 2 ξ min || w || C w , b 2 i = i 1 2 ⋅ ⋅ + + ≥ ≥ − − ξ ξ ∀ ∀ subject t subject t o o y y ( ( ) ) 1 1 w w x x b b i i i i i ξ ≥ 0 i 1 � � Dual: T α − α α max y y x x α i i j i j i j 2 i i , j � α = ≥ α ≥ subject t o y 0 , C 0 i i i i

  9. Linear algorithm: binary classification • Training = convex optimization problem (QP) : 1 � � T α − α α max y y x x implicit α i i j i j i j 2 i i , j embedding � � α α = = α α ≥ ≥ subject t subject t o o i y y 0 0 , , 0 0 i i i i i i X j 1 T T α − α α max e D KD α y y 2 X i K T α = α ≥ subject to y 0 , 0

  10. Kernel algorithm: Support Vector Machine (SVM) • Training = convex optimization problem (QP) : 1 T T α − α α max e D KD α y y 2 T α α = = ≥ ≥ α α ≥ ≥ subject t subject t o o y y 0 0 , , C C 0 0 Classification rule: classify new data point x : • n � SV T T = + = α + f ( x ) sign ( w x b ) sign ( y x x b ) i i i = i 1 Kernel algorithm !

  11. Support Vector Machines (SVM) • Hand-writing recognition (e.g., USPS) • Computational biology (e.g., micro-array data) • Text classification Face detection Face detection • • Face expression recognition Time series prediction (regression) • • Drug discovery (novelty detection)

  12. Different Kernels • Various kinds of Kernel – Linear kernel – Gaussian kernel – Gaussian kernel � � 2 − − X Y � � ( ) = K X , Y exp � � 2 σ � 2 � – Diffusion kernel – String Kernel – ……

  13. Learning with Multiple Kernels ? ? K

  14. Learning the optimal Kernel Overview of SVM with Overview of SVM with single kernel : single kernel : G(K)

  15. Learning the optimal Kernel Upper bound: Learn a linear mix the smaller, the better the guaranteed better the guaranteed performance G(K) G

  16. To be Continued To be Continued

Recommend


More recommend