kernel methods for fusing heterogeneous data
play

Kernel Methods for Fusing Heterogeneous Data Gunnar R atsch - PowerPoint PPT Presentation

SVMs Kernels & the Trick Non-vectorial Data Data Integration Software References Kernel Methods for Fusing Heterogeneous Data Gunnar R atsch Friedrich Miescher Laboratory, Max Planck Society T ubingen, Germany


  1. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Kernel Methods for Fusing Heterogeneous Data Gunnar R¨ atsch Friedrich Miescher Laboratory, Max Planck Society T¨ ubingen, Germany Pre-conference Course, Bio-IT World Europe, Hannover, Germany October 4, 2010 Friedrich Miescher Laboratory of the Max Planck Society

  2. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml

  3. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml

  4. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml

  5. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml

  6. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Margin Maximization Example: Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones Linear Classifiers with large margin fml

  7. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Margin Maximization SVMs: Maximize the margin! Why? Intuitively, it feels the safest. For a small error in the separating hyperplane, we do not suffer too many mistakes. Empirically, it works well. Learning theory indicates that it is the right thing to do. AG GC content before 'AG' AG AG w AG AG AG AG AG AG AG AG GC content after 'AG' fml

  8. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG w ���� i =1 AG =margin AG Subject to AG AG AG y i � w , x i � � ρ − ξ i AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml

  9. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG margin ���� i =1 AG =margin AG Subject to AG AG AG y i � w , x i � � ρ − ξ i AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml

  10. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG ���� i =1 AG =margin AG Subject to ξ AG AG AG y i � w , x i � � ρ − ξ i AG AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml

  11. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG ���� i =1 AG =margin AG Subject to ξ AG AG AG y i � w , x i � � ρ − ξ i AG AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml

  12. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones Linear Classifiers with large margin fml

  13. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones AG AG GC content before 'AG' AG More realistic problem? AG AG Not linearly separable! AG Need nonlinear separation? AG AG Need more features? AG AG AG AG fml GC content after 'AG'

  14. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones AG AG GC content before 'AG' More realistic problem? AG AG Not linearly separable! AG Need nonlinear AG AG separation? AG AG Need more features? AG AG AG fml GC content after 'AG'

  15. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Nonlinear Separations Linear separation might not be sufficient! ⇒ Map into a higher dimensional feature space Example: [Sch¨ olkopf and Smola, 2002] Φ : R 2 R 3 → √ ( z 1 , z 2 , z 3 ) := ( x 2 2 x 1 x 2 , x 2 ( x 1 , x 2 ) �→ 1 , 2 ) z 3 x 2 ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ❍ ✕ ✕ ✕ ✕ x 1 ❍ ❍ ❍ ❍ ✕ ❍ ❍ ❍ ✕ z 1 ❍ ❍ ❍ ✕ ✕ ❍ ✕ ❍ ❍ ❍ ✕ ✕ ❍ ✕ ✕ ✕ ✕ ✕ ✕ ✕ z 2 ✕ fml

  16. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Kernel “Trick” Kernel “Trick” √ Example: x ∈ R 2 and Φ( x ) := ( x 2 2 x 1 x 2 , x 2 1 , 2 ) [Boser et al., 1992] √ √ � � ( x 2 2 x 1 x 2 , x 2 x 2 x 2 � Φ( x ) , Φ(ˆ x ) � = 1 , 2 ) , (ˆ 1 , 2 ˆ x 1 ˆ x 2 , ˆ 2 ) x 2 ) � 2 = � ( x 1 , x 2 ) , (ˆ x 1 , ˆ x � 2 = � x , ˆ : =: k ( x , ˆ x ) Scalar product in feature space (here R 3 ) can be computed in input space (here R 2 )! Also works for higher orders and dimensions ⇒ relatively low-dimensional input spaces ⇒ very high-dimensional feature spaces fml

  17. SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Kernel “Trick” Common Kernels x � + c ) d Polynomial k ( x , ˆ x ) = ( � x , ˆ Sigmoid k ( x , ˆ x ) = tanh( κ � x , ˆ x � + θ ) � � x � 2 / (2 σ 2 ) RBF k ( x , ˆ x ) = exp −� x − ˆ Convex combinations k ( x , ˆ x ) = β 1 k 1 ( x , ˆ x ) + β 2 k 2 ( x , ˆ x ) Notes : These kernels are good for real-valued examples Kernels may be combined in case of heterogeneous data [Vapnik, 1995, M¨ uller et al., 2001, Sch¨ olkopf and Smola, 2002] fml

Recommend


More recommend