Molecular diagnosis, part II Florian Markowetz - PowerPoint PPT Presentation

Molecular diagnosis, part II Florian Markowetz � � � � � � � � � � � � florian.markowetz@molgen.mpg.de � � � � � � � Max Planck Institute for Molecular Genetics � � � � � � � � Computational Diagnostics Group � � � � � � � � � � Berlin, Germany � � � IPM workshop Tehran, 2005 April

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Supervised learning In the first part, I introduced molecular diagnosis as a problem of classification in high dimensions . From given patient expression profiles and labels, we derive a classifier to predict future patients . By the labels we are given a structure in the data. Our task: extract and generalize the structure. This is a problem if supervised learning . It is different from unsupervised learning , where we have to find a structure in the data by ourselves: Clustering, class discovery . Florian Markowetz, Molecular diagnosis, part II , 2005 April 1

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What’s to come This part will deal with 1. Support vector machines − → Maximal margin hyperplanes, non-linear similarity measures 2. Model selection and assessment − → Traps and pitfalls, or: How to cheat. 3. Interpretation of results − → what do classifiers teach us about biology? Florian Markowetz, Molecular diagnosis, part II , 2005 April 2

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Support Vector Machines Florian Markowetz, Molecular diagnosis, part II , 2005 April 3

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Which hyperplane is the best? A B C D Florian Markowetz, Molecular diagnosis, part II , 2005 April 4

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � No sharp knive, but a fat plane FAT PLANE Samples with positive label Samples with negative label Florian Markowetz, Molecular diagnosis, part II , 2005 April 5

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Separate the training set with maximal margin A hyperplane is a set of points x satisfying Margin Samples with positive � w , x � + b = 0 label corresponding to a decision function Samples with negative Separating label Hyperplane c ( x ) = sign ( � w , x � + b ) . There exists a unique maximal margin hyperplane solving min {� x − x ( i ) � : x ∈ R p , � w , x � + b = 0 , i = 1 , . . . , N } maximize w ,b Florian Markowetz, Molecular diagnosis, part II , 2005 April 6

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Florian Markowetz, Molecular diagnosis, part II , 2005 April 7

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Then the maximal margin hyperplane is the solution of the primal optimization problem 1 2 � w � 2 minimize w ,b y i ( � x ( i ) , w � + b ) ≥ 1 , subject to for all i = 1 , . . . , N Florian Markowetz, Molecular diagnosis, part II , 2005 April 7

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. Florian Markowetz, Molecular diagnosis, part II , 2005 April 8

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. KKT conditions: for all i α i ( y i ( � x ( i ) , w � + b ) − 1) = 0 Florian Markowetz, Molecular diagnosis, part II , 2005 April 8

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian cont’d Derivatives w.r.t primal variables must vanish: ∂ ∂ ∂bL ( w , b, α ) = 0 and ∂ w L ( w , b, α ) = 0 , which leads to � � α i y i x ( i ) . α i y i = 0 and w = i i Florian Markowetz, Molecular diagnosis, part II , 2005 April 9

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The dual optimization problem Substituting the conditions for the extremum into the Lagrangian, we arrive at the dual optimization problem: N N α i − 1 � � α i α j y i y j � x ( i ) , x ( j ) � , maximize 2 α i =1 i,j =1 N � subject to α i ≥ 0 and α i y i = 0 . i =1 Florian Markowetz, Molecular diagnosis, part II , 2005 April 10

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What are Support Vectors? By the KKT conditions, the points with α i > 0 satisfy Margin Samples with positive y i ( � x ( i ) , w � + b ) = 1 label These points nearest to the separating hyperplane are called Samples with negative g Support Vectors. n i label t a e r n a a p l p e r S e p The expansion of the w only y H depends on them. Florian Markowetz, Molecular diagnosis, part II , 2005 April 11

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes Capacity decreases with increasing margin! Consider hyperplanes � w , x � = 0 , where w is normalized such that min i |� w , x i �| = 1 for X = { x 1 , . . . , x N } . The set of decision functions f w = sign ( � w , x � ) defined on X satisfying � w � ≤ Λ , has a VC dimension h satisfying h ≤ R 2 Λ 2 Here, R is the radius of the smallest sphere centered at the origin and containing the training data [8]. Florian Markowetz, Molecular diagnosis, part II , 2005 April 12

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes With margin γ 1 we separate 3 points, with margin γ 2 only two. Florian Markowetz, Molecular diagnosis, part II , 2005 April 13

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Non-separable training sets Use linear separation, but admit training errors and margin violations. Separating Hyperplane Penalty of error: distance to hyperplane multiplied by error cost C . Florian Markowetz, Molecular diagnosis, part II , 2005 April 14

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Soft margin primal problem We relax the separation constraints to y i ( � x ( i ) , w � + b ) ≥ 1 − ξ i and minimize over w and b the objective function N 1 2 � w � 2 + C � ξ i . i =1 Writing down the Lagrangian, computing derivatives w.r.t primal variables, substituting them back into the objective function . . . Florian Markowetz, Molecular diagnosis, part II , 2005 April 15

Molecular diagnosis, part II Florian Markowetz - PowerPoint PPT Presentation

Molecular diagnosis, part II Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics

TWO INTERESTING CASES OF MOLECULAR DIAGNOSIS FOR HHT: LOW-LEVEL MOSAICISM AND ABNORMAL SPLICING

Improved Surveillance and Diagnosis Capabilities in Mexico after the Influenza Pandemics Dra.

NE IGE NE xt generation sequencing for molecular g g diagnosis I n onco GE netics Nicolas

GENES, DOGS AND CANCER; EMERGING CONCEPTS IN MOLECULAR DIAGNOSIS AND THERAPY Hosted by AMC Cancer

Replica-exchange in molecular dynamics Part of 2014 SeSE course in Advanced molecular dynamics

Case # SH2017-0156 Unmasking of Multiorgan Involvement by Systemic Mastocytosis Panel Diagnosis:

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Clinical Evidence for Genomic Medicine Sustainability: State of Science and Gaps Molecular

Modeling Contacts in Macro-molecular assemblies: from Inference to Assessment

EHV1 outbreak Diagnosis and epidemiology Andrew McFadden and Katie Hickey www.mpi.govt.nz

Diagnosis and the clinical spectrum of leprosy Salvatore Noto, Pieter A. Schreuder and Bernard

Molecular Biology and Genetics Prof. Mohammad El-Khateeb Dr. Mamoun Ahram Curriculum (Part I:

Molecular Spectroscopy: Molecular Spectroscopy How are some molecular parameters

1 molecular evolution molecular phylogenetics evolution of molecules genomics bioinformatics

CSCI 490 Bioinformatics Part I: Introduction to Bioinformatics and Molecular Biology Course

Some Mathematical Challenges from Molecular Biology Part I Peter Schuster Institut fr

Laboratory diagnosis of thrombotic microangiopathies Measurement of ADAMTS-13 Jovan P. Antovic

CS188 Outline Were done with Part I: Search and Planning! Part II: Probabilistic

1 Differential Diagnosis March 14, 2019 Diagnosis is the identification of the

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

Molecular Motors Roop Mallik What is a Molecular Motor ? Why should you care about Molecular

The Diagnosis of Leprosy - Part II. The 2 nd cardinal signs of leprosy Salvatore Noto, Pieter A M

Diagnosis of leprosy - Part II. The 1 st cardinal signs of leprosy Salvatore Noto, Pieter A M

Diagnosis of vaccine preventable diseases Mark Nicol Division of Medical Microbiology and