kernel methods for network analysis an introduction
play

Kernel methods for Network Analysis: An introduction Chiranjib - PowerPoint PPT Presentation

Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 13th Jan, 2013 Computational Biology Which super-family does


  1. Kernel methods for Network Analysis: An introduction Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 13th Jan, 2013

  2. Computational Biology Which super-family does this protein structure belongs to?

  3. Multimedia Who are the actors?

  4. Social Networks How can one run a succesful Ad-campaign on this network?

  5. Data Representation as a vector

  6. Data Representation as a vector

  7. Data Representation as a vector

  8. Data Representation as a vector

  9. Data Representation as a vector   f  e      a     t     u     r     e     m     a   p

  10. When we have Feature maps Linear Classifiers, Principal Component Analysis

  11. Similarity maybe readily available Problem Feature maps are not readily available

  12. Kernel functions- a formal notion of similarity functions Kernel functions are essentially similarity functions. One can easily generalize many existing algorithms using kernel functions.Sometimes called the kernel trick Kernels can help integrate different sources of data

  13. Agenda 1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

  14. 1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

  15. PART 1: KERNEL TRICK

  16. 1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

  17. The problem of classification Given Training data D = { ( x i , y i ) | i = 1 ,..., m } observation x i class label y i ∈ {− 1 , 1 } Find A classifier f : X → {− 1 , 1 } . f ( x ) = sign ( w ⊤ x + b )

  18. Regularized risk m 1 max ( 1 − y i ( w ⊤ x i + b ) , 0 ) 2 � w � 2 ∑ + min w , b C i = 1 � �� � � �� � Regularization Risk

  19. Regularized risk m 1 max ( 1 − y i ( w ⊤ x i + b ) , 0 ) 2 � w � 2 ∑ + min w , b C i = 1 � �� � � �� � Regularization Risk The SVM formulation m 1 2 � w � 2 + C ∑ ξ i min w , b , ξ i = 1 subject to y i ( w ⊤ x i + b ) ≥ 1 − ξ i ξ i ≥ 0 ∀ i ∈ [ m ]

  20. SVM formulation m α i − 1 α i α j y i y j x ⊤ ∑ 2 ∑ maximize α i x j i = 1 ij m ∑ subject to 0 ≤ α i , α i y i = 0 i = 1

  21. SVM formulation m α i − 1 α i α j y i y j x ⊤ ∑ 2 ∑ maximize α i x j i = 1 ij m ∑ subject to 0 ≤ α i , α i y i = 0 i = 1 w = ∑ m i = 1 α i y i x i m α i y i x ⊤ ∑ f ( x ) = sign ( i x + b ) i = 1

  22. C-SVM in feature spaces Let us work with a feature map, Φ ( x ) m maximize α − 1 α i α j y i y j Φ ( x i ) ⊤ Φ ( x j )+ 2 ∑ ∑ α i i = 1 ij subject to 0 ≤ α i , ∑ α i y i = 0 i m α i y i Φ ( x i ) ⊤ Φ ( x )+ b ) ∑ f ( x ) = sign ( i = 1 The dot product between any pair of examples computed in the feature space be denoted by K ( x , z ) = Φ ( x ) ⊤ Φ ( z )

  23. C-SVM in feature spaces Let us work with a feature map, Φ ( x ) m maximize α − 1 2 ∑ ∑ α i α j y i y j K ( x i , x j )+ α i i = 1 ij subject to 0 ≤ α i , ∑ α i y i = 0 i m ∑ f ( x ) = sign ( α i y i K ( x i , x )+ b ) i = 1 The dot product between any pair of examples computed in the feature space be denoted by K ( x , z ) = Φ ( x ) ⊤ Φ ( z )

  24. 1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

  25. Principal Component Analysis(PCA) Principal Directions Given X = [ x 1 ,..., x m ] find directions of maximum variance( Jollife 2002). The direction of maximum variance, v , is given by 1 mXX ⊤ v = λ v (assuming that Xe = 0) Define v = X α 1 mXX ⊤ X α = λ X α leading to the following eigenvalue problem 1 m K α = λα where ( K ) ij = ( X ⊤ X ) ij = x ⊤ i x j .

  26. Nonlinear component analysis(Scholkopf et al. 1996) Compute PCA in feature spaces Replace x ⊤ i x j by Φ ( x i ) ⊤ Φ ( x j ) Principal component of x In input space In feature space v ⊤ x ∑ m i = 1 α i K ( x i , x )

  27. We just need the dot product √ Let x ∈ IR 2 and Φ ( x ) = [ x 2 2 x 1 x 2 ] ⊤ 1 x 2 2 K ( x , z ) = Φ ( x ) ⊤ Φ ( z ) = x 2 2 = ( x ⊤ z ) 2 1 z 2 1 + 2 x 1 x 2 z 1 z 2 + x 2 2 z 2 � d + r − 1 � If K ( x , z ) = ( x ⊤ z ) r is a dot product in a feature space r corresponding to x , z ∈ IR d . If d = 256 , r = 4, the feature space size is 6 , 35 , 376. However if we know K one can still solve the SVM formulation without explicitly evaluating Φ

  28. 1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

  29. Norms, Distances � � � Φ ( x ) � = � Φ ( x ) , Φ ( x ) � = K ( x , x ) Normalized features Φ ( x ) K ( x , z ) Φ ( x ) ⊤ ˆ ˆ K ( x , z ) = ˆ ˆ Φ ( x ) = Φ ( z ) = � � Φ ( x ) � K ( x , x ) K ( z , z ) Distances � Φ ( x ) − Φ ( z ) � 2 = ( Φ ( x ) − Φ ( z )) ⊤ ( Φ ( x ) − Φ ( z )) = K ( x , x )+ K ( z , z ) − 2 K ( x , z ) If Φ is normalized K ( x , x ) = 1 then � Φ ( x ) − Φ ( z ) � 2 = 2 − 2 K ( x , z )

  30. In the sequel Will formalize these notions conditions on K will be discussed K for graphs

  31. 1 Kernel Trick SVMs and Non-linear Classification Principal Component Analysis What can we compute with the dot product in feature spaces? 2 Mathematical Foundations RKHS, Representer theorem 3 Kernels on Graphs aka Networks Kernels on vertices of a Graph Kernels on graphs 4 Advanced Topics: Multiple Kernel Learning

  32. Definition of Kernel functions

  33. Kernel function Kernel function K : X × X → IR is a Kernel function if K ( x , z ) = K ( z , x ) symmetric K is positive semidefinite, i.e. ∀ n , x 1 ,..., x n ∈ X , the matrix K ij = K ( x i , x j ) is psd Recall that a K ∈ IR d × d is psd if u ⊤ K u ≥ 0 for all u ∈ IR d .

  34. Examples of Kernel functions K ( x , z ) = φ ( x ) ⊤ φ ( z ) where φ : X → IR d K is symmetric i.e. K ( x , z ) = K ( z , x ) Positive Semidefinite: Let D = { x 1 , x 2 ,..., x n } be set of arbitrarily chosen n elements of X . Define K ij = φ ( x i ) ⊤ φ ( x j ) For any u ∈ IR n it is straightforward to see that m u ⊤ K u = � u i φ ( x i ) � 2 ∑ 2 ≥ 0 i = 1

  35. Examples of Kernel functions K ( x , z ) = x ⊤ z Φ ( x ) = x � 2 ... x t d K ( x , z ) = ( x ⊤ z ) r t 1 ! t 2 ! .... t d ! x t 1 1 x t 2 r ! Φ t 1 t 2 ... t d ( x ) = d ∑ d i = 1 t i = r K ( x , z ) = e − γ � x − z � 2

  36. Kernel Construction Let K 1 and K 2 be two valid kernels. K ( x , y ) = φ ( x ) ⊤ φ ( y ) K ( u , v ) = K 1 ( u , v ) K 2 ( u , v ) K = α K 1 + β K 2 α , β ≥ 0 K ( x , y ) ˆ K ( x , y ) = � � K ( x , x ) K ( y , y )

  37. Kernel Construction Let K 1 and K 2 be two valid kernels. K ( x , y ) = φ ( x ) ⊤ φ ( y ) K ( x , y ) = x ⊤ y K ( x , y ) = ( x ⊤ y ) i K ( u , v ) = K 1 ( u , v ) K 2 ( u , v ) K = α K 1 + β K 2 α , β ≥ 0 ( x ⊤ y ) i N = e x ⊤ y ∑ K ( x , y ) = lim K ( x , y ) i ! N → ∞ i = 0 ˆ K ( x , y ) = � � K ( x , x ) K ( y , y ) K ( x , y ) = e − 1 2 � x − y � 2 ˆ

  38. Kernel function and feature map A theorem due to Mercer guarantees a feature map for symmetric, psd kernel functions. Loosely stated For a symmetric kernel K : X × X → IR, there exists an expansion K ( x , z ) = Φ ( x ) ⊤ Φ ( z ) iff � X g ( x ) g ( z ) K ( x , z ) dxdz ≥ 0

  39. What is a Dot product(aka Inner Product) Let X be a vector space. What is a Dot product Symmetry < u , v > = < v , u > u , v ∈ X Bilinear < α u + β v , w > = α < u , w > + β < v , w > u , v , w , ∈ X Positive Semidefinite < u , u > ≥ 0 u ∈ X < u , u > = 0 iff u = 0 Norm � � x � = � x , x � � x � = 0 = ⇒ x = 0

  40. Examples of Dot products X = IR n ,< u , v > = u ⊤ v n X = IR n ,< u , v > = ∑ λ i u i v i λ i ≥ 0 i = 1 � X f ( x ) 2 dx < ∞ } X = L 2 ( X ) = { f : � f , g ∈ X < f , g > = X f ( x ) g ( x ) dx

Recommend


More recommend