1 Support vector machine prediction of signal peptide cleavage site using a new class of kernels for strings Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan
2 Outline 1. SVM and kernel methods 2. New kernels for bioinformatics 3. Example: signal peptide cleavage site prediction
3 Part 1 SVM and kernel methods
4 Support vector machines φ • Objects to classified x mapped to a feature space • Largest margin separating hyperplan in the feature space
5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) >
5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) > • Simple kernels can represent complex Φ
5 The kernel trick • Implicit definition of x → Φ( x ) through the kernel: def K ( x, y ) = < Φ( x ) , Φ( y ) > • Simple kernels can represent complex Φ • For a given kernel, not only SVM but also clustering, PCA, ICA... possible in the feature space = kernel methods
6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid...
6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors
6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings:
6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98)
6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99)
6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00)
6 Kernel examples • “Classical” kernels: polynomial, Gaussian, sigmoid... but the objects x must be vectors • “Exotic” kernels for strings: ⋆ Fisher kernel (Jaakkoola and Haussler 98) ⋆ Convolution kernels (Haussler 99, Watkins 99) ⋆ Kernel for translation initiation site (Zien et al. 00) ⋆ String kernel (Lodhi et al. 00)
7 Kernel engineering Use prior knowledge to build the geometry of the feature space through K ( ., . )
8 Part 2 New kernels for bioinfomatics
9 The problem • X a set of objects
9 The problem • X a set of objects • p ( x ) a probability distribution on X
9 The problem • X a set of objects • p ( x ) a probability distribution on X • How to build K ( x, y ) from p ( x ) ?
10 Product kernel K prod ( x, y ) = p ( x ) p ( y )
10 Product kernel K prod ( x, y ) = p ( x ) p ( y ) x p(y) p(x) 0 y
10 Product kernel K prod ( x, y ) = p ( x ) p ( y ) x p(y) p(x) 0 y SVM = Bayesian classifier
11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y )
11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y ) p(x) x p(y) z p(z) y
11 Diagonal kernel K diag ( x, y ) = p ( x ) δ ( x, y ) p(x) x p(y) z p(z) y No learning
12 Interpolated kernel If objects are composite: x = ( x 1 , x 2 ) : K ( x, y ) = K diag ( x 1 , y 1 ) K prod ( x 2 , y 2 )
12 Interpolated kernel If objects are composite: x = ( x 1 , x 2 ) : K ( x, y ) = K diag ( x 1 , y 1 ) K prod ( x 2 , y 2 ) = p ( x 1 ) δ ( x 1 , y 1 ) × p ( x 2 | x 1 ) p ( y 2 | y 1 ) A* AA AB BA B* BB
13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n )
13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n ) • A list of index subsets: V = { I 1 , . . . , I v } where I i ⊂ { 1 , . . . , n }
13 General interpolated kernel • Composite objects x = ( x 1 , . . . , x n ) • A list of index subsets: V = { I 1 , . . . , I v } where I i ⊂ { 1 , . . . , n } • Interpolated kernel: K V ( x, y ) = 1 � K diag ( x I , y I ) K prod ( x I c , y I c ) |V| I ∈V
14 Rare common subparts For a given p ( x ) and p ( y ) , we have: K V ( x, y ) = K prod ( x, y ) × 1 δ ( x I , y I ) � |V| p ( x I ) I ∈V
14 Rare common subparts For a given p ( x ) and p ( y ) , we have: K V ( x, y ) = K prod ( x, y ) × 1 δ ( x I , y I ) � |V| p ( x I ) I ∈V x and y get closer in the feature space when they share rare common subparts
15 Implementation • Factorization for particular choices of p ( . ) and V
15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n
15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n ⋆ product distribution p ( x ) = � n j =1 p j ( x j ) .
15 Implementation • Factorization for particular choices of p ( . ) and V • Example: ⋆ V = P ( { 1 , . . . , n } ) the set of all subsets: |V| = 2 n ⋆ product distribution p ( x ) = � n j =1 p j ( x j ) . ⋆ implementation in O ( n ) because n � � ( . . . ) = ( . . . ) i =1 I ∈V
16 Part 3 Application: SVM prediction of signal peptide cleavage site
17 Secretory pathway mRNA Nascent protein Signal peptide ER −Nucleus −Chloroplast Golgi −Mitochondrion −Cell surface (secreted) −Peroxisome −Lysosome −Cytosole −Plasma membrane
18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein
18 Signal peptides Protein -1 +1 (1) MKANAKTIIAGMIALAISHTAMA EE... (2) MKQSTIALALLPLLFTPVTKA RT... (3) MKATKLVLGAVILGSTLLAG CS... (1):Leucine-binding protein, (2):Pre-alkaline phosphatase, (3)Pre-lipoprotein • 6-12 hydrophobic residues (in yellow) • (-3,-1) : small uncharged residues
19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ]
19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ] • 1,418 positive examples, 65,216 negative examples
19 Experiment • Challenge : classification of aminoacids windows, positive if cleavage occurs between -1 and +1: [ x − 8 , x − 7 , . . . , x − 1 , x 1 , x 2 ] • 1,418 positive examples, 65,216 negative examples • Computation of a weight matrix: SVM + K prod (naive Bayes) vs SVM + K interpolated
20 Result: ROC curves 100 Interpolated Kernel False Negative (%) 80 Product Kernel (Bayes) 60 40 4 8 12 16 20 24 0 False positive (%)
21 Conclusion
22 Conclusion • An other way to derive a kernel from a probability distribution
22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts
22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts • Encouraging result on real-world application’ “how to improve a weight matrix based classifier”
22 Conclusion • An other way to derive a kernel from a probability distribution • Useful when objects can be compared by comparing subparts • Encouraging result on real-world application’ “how to improve a weight matrix based classifier” • Future work: more application-specific kernels
23 Acknowledgement • Minoru Kanehisa • Applied Biosystems for the travel grant
Recommend
More recommend