Kernel machines and sparsity 2 juillet, 2009 ENBIS’09, Saint Etienne Stéphane Canu & Alain Rakotomamonjy stephane.canu@litislab.eu
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Roadmap Introduction 1 A typical learning problem Kernel machines: a definition Tools: the functional framework 2 In the beginning was the kernel Kernel and hypothesis set Kernel machines and regularization path 3 non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simpleMKL: the multiple kernel solution Conclusion 5
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Optical character recognition Example (The MNIST database) ◮ MNIST 1 , data = « image-label » ◮ n = 60 , 000 ; d = 700 ; classes = 10 ◮ Kernel error rate = 0.56 %, ◮ Best error rate = 0.4 % . 7 8 7 9 1http://yann.lecun.com/exdb/mnist/index.html
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Learning challenges: the size effect 10 5 translation RCV 1 Number of variables Text Bio 3 key issues computing Scene analysis 10 4 Object recognition 1. learn any problem: 10 3 MNIST ◮ functional universality Speech Census 10 2 2. from data: Geostatistic Lucy 10 4 10 5 10 6 10 7 ◮ statistical consistency Sample size 3. with large data sets: ◮ computational efficency kernel machines adress these three issues (up to a certain point regarding efficency) L. Bottou, 2006
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Kernel machines Definition (Kernel machines) p � n � � � � � A ( x i , y i ) i = 1 , n ( x ) = ψ α i k ( x , x i ) + β j q j ( x ) i = 1 j = 1 α et β : parameters to be estimated. Exemples n � α i ( x − x i ) 3 A ( x ) = + + β 0 + β 1 x splines i = 1 � x − x i � 2 �� � α i exp − A ( x ) = sign + β 0 SVM b i ∈ I �� I { y = y i } ( x ⊤ x i + b ) 2 � P ( y | x ) = 1 Z exp α i 1 I exponential family i ∈ I
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Roadmap Introduction 1 A typical learning problem Kernel machines: a definition Tools: the functional framework 2 In the beginning was the kernel Kernel and hypothesis set Kernel machines and regularization path 3 non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simpleMKL: the multiple kernel solution Conclusion 5
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion In the beginning was the kenrel... Definition (Kernel) a function of two variable k from X × X to I R Definition (Positive kernel) A kernel k ( s , t ) on X is said to be positive ◮ if it is symetric: k ( s , t ) = k ( t , s ) ◮ an if for any finite positive interger n : n n � � ∀{ α i } i = 1 , n ∈ I R , ∀{ x i } i = 1 , n ∈ X , α i α j k ( x i , x j ) ≥ 0 i = 1 j = 1 it is strictly positive if for α i � = 0 n n � � α i α j k ( x i , x j ) > 0 i = 1 j = 1
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Examples of positive kernels R d , k ( s , t ) = s ⊤ t the linear kernel: s , t ∈ I symetric: s ⊤ t = t ⊤ s n n n n positive: � � � � α i α j x ⊤ α i α j k ( x i , x j ) = i x j i = 1 j = 1 i = 1 j = 1 � n � ⊤ 2 n � n � � � � � � = α i x i α j x j = α i x i � � � � i = 1 j = 1 � i = 1 � R d → I the product kernel: k ( s , t ) = g ( s ) g ( t ) for some g : I R , symetric by construction n n n n positive: � � � � α i α j k ( x i , x j ) = α i α j g ( x i ) g ( x j ) i = 1 j = 1 i = 1 j = 1 � n � n � � 2 n � � � = α i g ( x i ) α j g ( x j ) = α i g ( x i ) i = 1 j = 1 i = 1 k is positive ⇔ (its square root exists) ⇔ k ( s , t ) = � φ s , φ t � J.P. Vert, 2006
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion positive definite Kernel (PDK) algebra (closure) if k 1 ( s , t ) and k 2 ( s , t ) are two positive kernels R + ◮ DPK are a convex cone: ∀ a 1 ∈ I a 1 k 1 ( s , t ) + k 2 ( s , t ) ◮ product kernel k 1 ( s , t ) k 2 ( s , t ) proofs ◮ by linearity: n n n n n n � � � � � � � � α i α j a 1 k 1 ( i , j ) k 2 ( i , j ) = a 1 α i α j k 1 ( i , j ) + α i α j k 2 ( i , j ) i = 1 j = 1 i = 1 j = 1 i = 1 j = 1 n n n n ◮ by linearity: � � � � � � � �� � ψ ( x i ) ψ ( x j ) = α i ψ ( x i ) α j ψ ( x j ) α i α j i = 1 j = 1 i = 1 j = 1 ◮ assuming ∃ ψ ℓ s.t. k 1 ( s , t ) = � ℓ ψ ℓ ( s ) ψ ℓ ( t ) n n n n � � � � � � � α i α j k 1 ( x i , x j ) k 2 ( x i , x j ) = α i α j ψ ℓ ( x i ) ψ ℓ ( x j ) k 2 ( x i , x j ) i = 1 j = 1 i = 1 j = 1 ℓ n n � � � � � � � = α i ψ ℓ ( x i ) α j ψ ℓ ( x j ) k 2 ( x i , x j ) ℓ i = 1 j = 1 N. Cristianini and J. Shawe Taylor, kernel methods for pattern analysis, 2004
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Kernel engineering: building PDK ◮ for any polynomial with positive coef. φ from I R to I R � � φ k ( s , t ) R d to I R d ◮ if Ψ is a function from I � � k Ψ( s ) , Ψ( t ) R d to I R + , is minimum in 0 ◮ if ϕ from I k ( s , t ) = ϕ ( s + t ) − ϕ ( s − t ) ◮ convolution of two positive kernels is a positive kernel K 1 ⋆ K 2 the Gaussian kernel is a PDK = exp ( −� s � 2 − � t � 2 − 2 s ⊤ t ) exp ( −� s − t � 2 ) = exp ( −� s � 2 ) exp ( −� t � 2 ) exp ( 2 s ⊤ t ) ◮ s ⊤ t is a PDK and function exp as the limit of positive series expansion, so exp ( 2 s ⊤ t ) is a PDK ◮ exp ( −� s � 2 ) exp ( −� t � 2 ) is a PDK as a product kernel ◮ the product of two PDK is a PDK O. Catoni, master lecture, 2005
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion some examples of PD kernels... type name k ( s , t ) � � − r 2 radial gaussian exp , r = � s − t � b radial laplacian exp ( − r / b ) r 2 radial rationnal 1 − r 2 + b � d exp ( − r 2 0 , 1 − r � radial loc. gauss. max b ) 3 b ( s k − t k ) 2 χ 2 exp ( − r / b ) , r = � non stat. k s k + t k ( s ⊤ t ) p projective polynomial ( s ⊤ t + b ) p projective affine s ⊤ t / � s �� t � projective cosine � � s ⊤ t � s �� t � − b projective correlation exp Most of the kernels depends on a quantity b called the bandwidth
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion kernels for objects and structures kernels on histograms and probability distributions � � � k ( p ( x ) , q ( x )) = k i p ( x ) , q ( x ) P ( x ) dx I kernel on strings ◮ spectral string kernel k ( s , t ) = � u φ u ( s ) φ u ( t ) ◮ using sub sequences ◮ similarities by alignements k ( s , t ) = � π exp ( β ( s , t , π )) kernels on graphs ◮ the pseudo inverse of the (regularized) graph Laplacian L = D − A A is the adjency matrix D the degree matrix ◮ diffusion kernels 1 Z ( b ) exp bL ◮ subgraph kernel convolution (using random walks) and kernels on heterogeneous data (image), HMM, automata... Shawe-Taylor & Cristianini’s Book, 2004 ; JP Vert, 2006
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion Roadmap Introduction 1 A typical learning problem Kernel machines: a definition Tools: the functional framework 2 In the beginning was the kernel Kernel and hypothesis set Kernel machines and regularization path 3 non sparse kernel machines regularization path piecewise linear regularization path sparse kernel machines: SVR 4 Tuning the kernel: MKL the multiple kernel problem simpleMKL: the multiple kernel solution Conclusion 5
Introduction Tools: the functional framework Kernel machines and regularization path Tuning the kernel: MKL Conclusion From kernel to functions � � R ; t j ∈ X , f ( x ) = � m f � � m f < ∞ ; f j ∈ I H 0 = f j = 1 f j k ( x , t j ) let define the bilinear form ( g ( x ) = � m g i = 1 g i k ( x , s i ) ) : m f m g � � ∀ f , g ∈ H 0 , � f , g � H 0 = f j g i k ( t j , s i ) j = 1 i = 1 Evaluation functional: ∀ x ∈ X f ( x ) = � f ( . ) , k ( x , . ) � H 0 from k to H with any postive kernel, a hypothesis set H = ¯ H 0 can be constructed with its metric
Recommend
More recommend