Data Mining in Bioinformatics Day 5: Classification in Bioinformatics Karsten Borgwardt February 6 to February 17, 2012 Machine Learning & Computational Biology Research Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1
Karsten M. Borgwardt Protein function prediction via graph kernels ISMB 2005 Joint work with Cheng Soon Ong and S.V.N. Vishwanathan, Stefan Schönauer, Hans-Peter Kriegel and Alex Smola Ludwig-Maximilians-Universität Munich, Germany and National ICT Australia, Canberra
Content Introduction • The problem: protein function prediction • The method: Support Vector Machines (SVM) Our approach to function prediction • Protein graph model • Protein graph kernel • Experimental evaluation Technique to analyze our graph model • Hyperkernels Discussion Karsten Borgwardt et al. - Protein function prediction via graph kernels 2
Current approaches to protein function prediction similar phylogenetic profiles similar structures similar sequences similar motifs similar function similar chemical properties similar interaction partners similar surface clefts Karsten Borgwardt et al. - Protein function prediction via graph kernels 3
Current approaches to protein function prediction similar phylogenetic profiles similar structures similar sequences similar motifs similar function similar interaction similar chemical properties partners similar surface clefts Karsten Borgwardt et al. - Protein function prediction via graph kernels 4
Support Vector Machines Are new data points ( x ) red or black? The blue decision boundary allows to predict class membership of new data points. Karsten Borgwardt et al. - Protein function prediction via graph kernels 5
Kernel trick input space feature space mapping Ф kernel function The kernel trick allows to introduce a separating hyperplane in feature space. Karsten Borgwardt et al. - Protein function prediction via graph kernels 6
Feature vectors for function prediction protein structure e.g. Cai et al. (2004) , Dobson and Doig (2003) and/or protein sequence • hydrophobicity • polarity • polarizability • van der Waals volume •fraction of amino acid types •fraction of surface area •disulphide bonds •size of largest surface pocket 7
Our approach Sequence + Structure + Chemical properties Graph model SVMs + Graph models Protein function Karsten Borgwardt et al. - Protein function prediction via graph kernels 8
Protein graph model secondary protein sequence structure structure Karsten Borgwardt et al. - Protein function prediction via graph kernels 9
Protein graph model Node attributes Edge attributes • hydrophobicity • type (sequence, structure) • polarity • length • polarizability • van der Waals volume • length • helix, sheet, loop Karsten Borgwardt et al. - Protein function prediction via graph kernels 10
Protein graph kernel (Kashima et al. (2003) and Gärtner et al. (2003)) compares walks of identical length l l − 1 k walk v 1, ... ,v l , w 1, ... ,w l = ∑ k step v i ,v i 1 , w i ,w i 1 i = 1 Walks are similar, if along both walks • types of secondary structure elements (SSEs) are the same • distances between SSEs are similar • chemical properties of SSEs are similar 11
Example: Protein kernel S S S S S Protein A Protein B S Similar (H,10,S,1,S,3,H) (H,9,S,1,S,3,H) 12
Example: Protein kernel S S S S S Protein A Protein B S Dissimilar (H,10,S,1,S) (S,3,H,5,S) 13
Evaluation: enzymes vs. non-enzymes 10-fold cross-validation on 1128 proteins from dataset by Dobson and Doig (2003); 59 % are enzymes. Kernel type accuracy SD Vector kernel 76.86 1.23 Optimized vector kernel 80.17 1.24 Graph kernel 77.30 1.20 Graph kernel without structure 72.33 5.32 Graph kernel with global info 84.04 3.33 DALI classifier 75.07 4.58 Karsten Borgwardt et al. - Protein function prediction via graph kernels 14
Attribute selection Which structural or chemical attribute is most important for correct classification? For this purpose, we employ hyperkernels (Ong et. al, 2003). Hyperkernels find an optimal linear combination of input kernel matrices : minimizing training error and m ∑ β i K i fulfilling regularization constraints i = 1 Karsten Borgwardt et al. - Protein function prediction via graph kernels 15
Attribute selection Our approach: •Calculate kernel matrix for 600 proteins on graph model with only ONE single attribute! •Repeat this for all attributes •Normalize these kernel matrices •Determine hyperkernel combination •Weights then reflect contribution of individual attributes to correct classification 16
Attribute selection Attribute EC 1 EC 2 EC 3 EC 4 EC 5 EC 6 Amino acid length 1.00 0.31 1.00 1.00 0.73 0.00 3-bin van der Waals 0.00 0.00 0.00 0.00 0.00 0.00 3-bin Hydrophobicity 0.00 0.00 0.00 0.00 0.00 0.00 3-bin Polarity 0.00 0.01 0.00 0.00 0.00 1.00 3-bin Polarizability 0.00 0.00 0.00 0.00 0.12 0.00 3d length 0.00 0.40 0.00 0.00 0.00 0.00 Total van der Waals 0.00 0.00 0.00 0.00 0.00 0.00 Total Hydrophobicity 0.00 0.13 0.00 0.00 0.01 0.00 Total Polarity 0.00 0.14 0.00 0.00 0.01 0.00 Total Polarizability 0.00 0.01 0.00 0.00 0.13 0.00 Karsten Borgwardt et al. - Protein function prediction via graph kernels 17
Discussion • Novel combined approach to protein function prediction integrating sequence, structure and chemical information • Reaches state-of-the-art classification accuracy on less information; higher accuracy levels on same amount of information • Hyperkernels for finding most interesting protein characteristics Karsten Borgwardt et al. - Protein function prediction via graph kernels 18
Discussion • More detailed graph models (amino acids, atoms) might be more interesting, yet raise computational difficulties (graphs too large!) Two directions of future research: • Efficient, yet expressive graph kernels for structure • Integrating more proteomic information, e.g. surface pockets, into our graph model Karsten Borgwardt et al. - Protein function prediction via graph kernels 19
The End Thank you! Questions? Karsten Borgwardt et al. - Protein function prediction via graph kernels 20
ARTS: Accurate Recognition of Transcription Starts in human ∗ ,♮ atsch ♮ † S¨ oren Sonnenburg, Alexander Zien, Gunnar R¨ † Fraunhofer FIRST.IDA, Kekul´ estr. 7, 12489 Berlin, Germany ♮ Friedrich Miescher Laboratory of the Max Planck Society, ∗ Max Planck Institute for Biological Cybernetics, Spemannstr. 37-39, 72076 T¨ ubingen, Germany Soeren.Sonnenburg@first.fraunhofer.de, { Alexander.Zien,Gunnar.Raetsch } @tuebingen.mpg.de
Promoter Detection Overview: • Transcription Start Site (TSS) • Features to describe the TSS • Our approach • Evaluation with current methods • Example - Protocadherin- α • Summary Sonnenburg, Zien, R¨ atsch 1
Promoter Detection Transcription Start Site - Properties • POL II binds to a rather vague region of ≈ [ − 20 , +20] bp • Upstream of TSS: promoter containing transcription factor binding sites • Downstream of TSS: 5’ UTR, and further downstream coding regions and introns (different statistics) • 3D structure of the promoter must allow the transcription factors to bind ⇒ Promoter Prediction is non-trivial Sonnenburg, Zien, R¨ atsch 2
Promoter Detection Features to describe the TSS • TFBS in Promoter region • condition: DNA should not be too twisted • CpG islands (often over TSS/first exon; in most, but not all promoters) • TSS with TATA box ( ≈ − 30 bp upstream) • Exon content in UTR 5” region • Distance to first donor splice site Idea: Combine weak features to build strong promoter predictor Sonnenburg, Zien, R¨ atsch 3
Promoter Detection The ARTS Approach use SVM classifier � N s � • � f ( x ) = sign y i α i k( x , x i ) + b i =1 • key ingredient is kernel k ( x , x ′ ) — similarity of two sequences • use 5 sub-kernels suited to model the aforementioned features k( x , x ′ ) = k T SS ( x , x ′ )+k CpG ( x , x ′ )+k coding ( x , x ′ )+k energy ( x , x ′ )+k twist ( x , x ′ ) Sonnenburg, Zien, R¨ atsch 4
Promoter Detection The 5 sub-kernels 1. TSS signal (including parts of core promoter with TATA box) – use Weighted Degree Shift kernel 2. CpG Islands, distant enhancers and TFBS upstream of TSS – use Spectrum kernel (large window upstream of TSS) 3. Model coding sequence TFBS downstream of TSS – use another Spectrum kernel (small window downstream of TSS) 4. Stacking energy of DNA – use btwist energy of dinucleotides with Linear kernel 5. Twistedness of DNA – use btwist angle of dinucleotides with Linear kernel Sonnenburg, Zien, R¨ atsch 5
Promoter Detection Weighted Degree Shift Kernel k(x1,x2) = w6,3 + w6,-3 + w3,4 x 1 x 2 • Count matching substrings of length 1 . . . d • Weight according to length of the match β 1 . . . β d • Position dependent but tolerates “shifts” of up to S L − k +1 d S � � � k( x , x ′ ) = δ s (I( x [ k : l + s ]= x ′ [ k : l ])+I( x [ k : l ]= x ′ [ k : l + s ])) β k s =0 k =1 l =1 s + l ≤ L x [ k : l ] := subsequence of x of length k starting at position l Sonnenburg, Zien, R¨ atsch 6
Recommend
More recommend