Molecular diagnosis, part II Florian Markowetz � � � � � � � � � � � � florian.markowetz@molgen.mpg.de � � � � � � � Max Planck Institute for Molecular Genetics � � � � � � � � Computational Diagnostics Group � � � � � � � � � � Berlin, Germany � � � IPM workshop Tehran, 2005 April
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Supervised learning In the first part, I introduced molecular diagnosis as a problem of classification in high dimensions . From given patient expression profiles and labels, we derive a classifier to predict future patients . By the labels we are given a structure in the data. Our task: extract and generalize the structure. This is a problem if supervised learning . It is different from unsupervised learning , where we have to find a structure in the data by ourselves: Clustering, class discovery . Florian Markowetz, Molecular diagnosis, part II , 2005 April 1
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What’s to come This part will deal with 1. Support vector machines − → Maximal margin hyperplanes, non-linear similarity measures 2. Model selection and assessment − → Traps and pitfalls, or: How to cheat. 3. Interpretation of results − → what do classifiers teach us about biology? Florian Markowetz, Molecular diagnosis, part II , 2005 April 2
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Support Vector Machines Florian Markowetz, Molecular diagnosis, part II , 2005 April 3
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Which hyperplane is the best? A B C D Florian Markowetz, Molecular diagnosis, part II , 2005 April 4
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � No sharp knive, but a fat plane FAT PLANE Samples with positive label Samples with negative label Florian Markowetz, Molecular diagnosis, part II , 2005 April 5
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Separate the training set with maximal margin A hyperplane is a set of points x satisfying Margin Samples with positive � w , x � + b = 0 label corresponding to a decision function Samples with negative Separating label Hyperplane c ( x ) = sign ( � w , x � + b ) . There exists a unique maximal margin hyperplane solving min {� x − x ( i ) � : x ∈ R p , � w , x � + b = 0 , i = 1 , . . . , N } maximize w ,b Florian Markowetz, Molecular diagnosis, part II , 2005 April 6
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Florian Markowetz, Molecular diagnosis, part II , 2005 April 7
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Then the maximal margin hyperplane is the solution of the primal optimization problem 1 2 � w � 2 minimize w ,b y i ( � x ( i ) , w � + b ) ≥ 1 , subject to for all i = 1 , . . . , N Florian Markowetz, Molecular diagnosis, part II , 2005 April 7
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. Florian Markowetz, Molecular diagnosis, part II , 2005 April 8
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. KKT conditions: for all i α i ( y i ( � x ( i ) , w � + b ) − 1) = 0 Florian Markowetz, Molecular diagnosis, part II , 2005 April 8
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian cont’d Derivatives w.r.t primal variables must vanish: ∂ ∂ ∂bL ( w , b, α ) = 0 and ∂ w L ( w , b, α ) = 0 , which leads to � � α i y i x ( i ) . α i y i = 0 and w = i i Florian Markowetz, Molecular diagnosis, part II , 2005 April 9
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The dual optimization problem Substituting the conditions for the extremum into the Lagrangian, we arrive at the dual optimization problem: N N α i − 1 � � α i α j y i y j � x ( i ) , x ( j ) � , maximize 2 α i =1 i,j =1 N � subject to α i ≥ 0 and α i y i = 0 . i =1 Florian Markowetz, Molecular diagnosis, part II , 2005 April 10
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What are Support Vectors? By the KKT conditions, the points with α i > 0 satisfy Margin Samples with positive y i ( � x ( i ) , w � + b ) = 1 label These points nearest to the separating hyperplane are called Samples with negative g Support Vectors. n i label t a e r n a a p l p e r S e p The expansion of the w only y H depends on them. Florian Markowetz, Molecular diagnosis, part II , 2005 April 11
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes Capacity decreases with increasing margin! Consider hyperplanes � w , x � = 0 , where w is normalized such that min i |� w , x i �| = 1 for X = { x 1 , . . . , x N } . The set of decision functions f w = sign ( � w , x � ) defined on X satisfying � w � ≤ Λ , has a VC dimension h satisfying h ≤ R 2 Λ 2 Here, R is the radius of the smallest sphere centered at the origin and containing the training data [8]. Florian Markowetz, Molecular diagnosis, part II , 2005 April 12
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes With margin γ 1 we separate 3 points, with margin γ 2 only two. Florian Markowetz, Molecular diagnosis, part II , 2005 April 13
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Non-separable training sets Use linear separation, but admit training errors and margin violations. Separating Hyperplane Penalty of error: distance to hyperplane multiplied by error cost C . Florian Markowetz, Molecular diagnosis, part II , 2005 April 14
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Soft margin primal problem We relax the separation constraints to y i ( � x ( i ) , w � + b ) ≥ 1 − ξ i and minimize over w and b the objective function N 1 2 � w � 2 + C � ξ i . i =1 Writing down the Lagrangian, computing derivatives w.r.t primal variables, substituting them back into the objective function . . . Florian Markowetz, Molecular diagnosis, part II , 2005 April 15
Recommend
More recommend