SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Kernel Methods for Fusing Heterogeneous Data Gunnar R¨ atsch Friedrich Miescher Laboratory, Max Planck Society T¨ ubingen, Germany Pre-conference Course, Bio-IT World Europe, Hannover, Germany October 4, 2010 Friedrich Miescher Laboratory of the Max Planck Society
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Roadmap SVM-related publications Support Vector Machines (SVMs) 450 SVM-related publications in PubMed Kernels and the “trick” 400 350 Kernels for non-vectorial data 300 250 Heterogeneous data integration 200 150 Examples 100 50 Software 0 2000 2002 2004 2006 2008 2010* Slides and additional material available at: http://tinyurl.com/dfbi2010 http://fml.mpg.de/raetsch/lectures/datafusion-bio-it-2010 fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Margin Maximization Example: Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones Linear Classifiers with large margin fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Margin Maximization SVMs: Maximize the margin! Why? Intuitively, it feels the safest. For a small error in the separating hyperplane, we do not suffer too many mistakes. Empirically, it works well. Learning theory indicates that it is the right thing to do. AG GC content before 'AG' AG AG w AG AG AG AG AG AG AG AG GC content after 'AG' fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG w ���� i =1 AG =margin AG Subject to AG AG AG y i � w , x i � � ρ − ξ i AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG margin ���� i =1 AG =margin AG Subject to AG AG AG y i � w , x i � � ρ − ξ i AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG ���� i =1 AG =margin AG Subject to ξ AG AG AG y i � w , x i � � ρ − ξ i AG AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Support Vector Machines for Binary Classification How to Maximize the Margin? Maximize n � AG ρ − C ξ i GC content before 'AG' AG AG ���� i =1 AG =margin AG Subject to ξ AG AG AG y i � w , x i � � ρ − ξ i AG AG ξ i � 0 AG AG for all i = 1 , . . . , n , � w � = 1 . GC content after 'AG' Examples on the margin are called support vectors [Vapnik, 1995] Soft margin SVMs [Cortes and Vapnik, 1995] Hyperplane only depends on distances between examples: d ( x , x ′ ) 2 = � x − x ′ � 2 = � x , x � − � x , x ′ � + � x ′ , x ′ � � �� � scalar product fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones Linear Classifiers with large margin fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones AG AG GC content before 'AG' AG More realistic problem? AG AG Not linearly separable! AG Need nonlinear separation? AG AG Need more features? AG AG AG AG fml GC content after 'AG'
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Recognition of Splice Sites Given: Potential acceptor splice sites intron exon Goal: Rule that distinguishes true from false ones AG AG GC content before 'AG' More realistic problem? AG AG Not linearly separable! AG Need nonlinear AG AG separation? AG AG Need more features? AG AG AG fml GC content after 'AG'
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Inflating the Feature Space Nonlinear Separations Linear separation might not be sufficient! ⇒ Map into a higher dimensional feature space Example: [Sch¨ olkopf and Smola, 2002] Φ : R 2 R 3 → √ ( z 1 , z 2 , z 3 ) := ( x 2 2 x 1 x 2 , x 2 ( x 1 , x 2 ) �→ 1 , 2 ) z 3 x 2 ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ✕ ❍ ✕ ✕ ✕ ✕ x 1 ❍ ❍ ❍ ❍ ✕ ❍ ❍ ❍ ✕ z 1 ❍ ❍ ❍ ✕ ✕ ❍ ✕ ❍ ❍ ❍ ✕ ✕ ❍ ✕ ✕ ✕ ✕ ✕ ✕ ✕ z 2 ✕ fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Kernel “Trick” Kernel “Trick” √ Example: x ∈ R 2 and Φ( x ) := ( x 2 2 x 1 x 2 , x 2 1 , 2 ) [Boser et al., 1992] √ √ � � ( x 2 2 x 1 x 2 , x 2 x 2 x 2 � Φ( x ) , Φ(ˆ x ) � = 1 , 2 ) , (ˆ 1 , 2 ˆ x 1 ˆ x 2 , ˆ 2 ) x 2 ) � 2 = � ( x 1 , x 2 ) , (ˆ x 1 , ˆ x � 2 = � x , ˆ : =: k ( x , ˆ x ) Scalar product in feature space (here R 3 ) can be computed in input space (here R 2 )! Also works for higher orders and dimensions ⇒ relatively low-dimensional input spaces ⇒ very high-dimensional feature spaces fml
SVMs Kernels & the “Trick” Non-vectorial Data Data Integration Software References Kernel “Trick” Common Kernels x � + c ) d Polynomial k ( x , ˆ x ) = ( � x , ˆ Sigmoid k ( x , ˆ x ) = tanh( κ � x , ˆ x � + θ ) � � x � 2 / (2 σ 2 ) RBF k ( x , ˆ x ) = exp −� x − ˆ Convex combinations k ( x , ˆ x ) = β 1 k 1 ( x , ˆ x ) + β 2 k 2 ( x , ˆ x ) Notes : These kernels are good for real-valued examples Kernels may be combined in case of heterogeneous data [Vapnik, 1995, M¨ uller et al., 2001, Sch¨ olkopf and Smola, 2002] fml
Recommend
More recommend