Scalable Learning in Reproducing Kernel Kre˘ ın Spaces Dino Oglic 1 Thomas Gärtner 2 1 Department of Informatics, King’s College London 2 School of Computer Science, University of Nottingham In Proceedings of the 36th International Conference on Machine Learning (ICML 2019)
Learning in Reproducing Kernel Kre˘ ın Spaces Motivation In learning problems with structured data (e.g., time-series, strings, graphs), it is relatively easy to devise a pairwise (dis)similarity function based on intuition of a domain expert To find an optimal hypothesis with standard kernel methods positive definiteness of the kernel/similarity func- tion needs to be established A large number of pairwise (dis)similarity functions devised by experts are indefinite (e.g., edit distances for strings and graphs, dynamic time-warping algorithm, Wasserstein and Haussdorf distances) Goal Scalable kernel methods for learning with any notion of (dis)similarity between instances. Kre˘ ın Space (Bognár , 1974; Azizov & Iokhvidov, 1981) The vector space K with a bilinear form �· , ·� K is called Kre˘ ın space if it admits a decomposition into a direct sum K = H + ⊕ H − of �· , ·� K -orthogonal Hilbert spaces H ± such that �· , ·� K can be written as � f , g � K = � f + , g + � H + − � f − , g − � H − , where H ± are endowed with inner products �· , ·� H ± , f = f + ⊕ f − , g = g + ⊕ g − , and f ± , g ± ∈ H ± . 1 / 4
Learning in Reproducing Kernel Kre˘ ın Spaces Overview Associated Hilbert Space For a decomposition K = H + ⊕ H − , the Hilbert space H K = H + ⊕ H − endowed with inner product � f , g � H K = � f + , g + � H + + � f − , g − � H − ( f ± , g ± ∈ H ± ) can be associated with K . All the norms �·� H K generated by di ff erent decompositions of K into direct sums of Hilbert spaces are topologi- cally equivalent (Langer, 1962) The topology on K defined by the norm of an associated Hilbert space is called the strong topology on K ⇒ � f , f � K = � f + � 2 H + − � f − � 2 ∃ f ∈ K : � f , f � K < 0 = H − does not induce a norm on a reproducing kernel Kre˘ ın space K The complexity of hypotheses can be penalized via decomposition components H ± and the strong topology Scalability ! Computational and space complexities are often quadratic in the number of instances and in several ap- proaches the computational complexity is cubic. 2 / 4
Nyström Method for Indefinite Kernels Overview X is an instance space X = { x 1 ,..., x n } is an independent sample from a probability measure defined on X ın kernel with k ( x , x ′ ) = � k ( x , · ) , k ( x ′ , · ) � K k : X × X → � is a reproducing Kre˘ 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) k ( x w , · ) 2 k ( x v , · ) 0 1 0 1 k ( x u , · ) 0 1 2 2 3 3 3 / 4
Nyström Method for Indefinite Kernels Landmarks Z = { z 1 ,..., z m } is a set of landmarks (not necessarily a subset of X ) 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) k ( z 3 , · ) 2 k ( z 2 , · ) 0 1 0 1 k ( z 1 , · ) 0 1 2 2 3 3 3 / 4
Nyström Method for Indefinite Kernels Projections onto L Z = span ( { k ( z 1 , · ) , ··· , k ( z m , · ) } ) For a given set of landmarks Z , the Nyström method approximates the kernel matrix � � � ˜ � �� K with a low-rank matrix ˜ K given by ˜ K ij = ˜ k ( x i , · ) , ˜ = x j , · k x i , x j k K m k ( x , · ) + k ⊥ ( x , · ) k ⊥ ( x , · ) , L Z � � � k ( x , · ) = ˜ ˜ with k ( x , · ) = � i , x k ( z i , · ) ∧ K = 0 i = 1 5 k ( x s , · ) k ( x q , · ) k ( x r , · ) 4 3 k ( x p , · ) ˜ k ( z 3 , · ) 2 ˜ ˜ k ( x r , · ) k ( z 2 , · ) 0 1 ˜ k ( x q , · ) ˜ k ( x s , · ) 0 1 ˜ ˜ 0 k ( x p , · ) k ( z 1 , · ) 1 2 2 ˜ m , m K m , n = ˜ U m ˜ � m ˜ ˜ m ˜ 3 3 K = K n , m K − 1 U ⊤ U ⊤ with U m = � m m 3 / 4
Scalable Learning in Reproducing Kernel Kre˘ ın Spaces Contributions First mathematically complete derivation of the Nyström method for indefinite kernels An approach for e ffi cient low-rank eigendecomposition of indefinite kernel matrices Two e ff ective landmark selection strategies for the Nyström method with indefinite kernels Nyström-based scalable least squares methods for learning in reproducing kernel Kre˘ ın spaces Nyström-based scalable support vector machine for learning in reproducing kernel Kre˘ ın spaces E ff ective regularization via decomposition components H ± and the strong topology python package for learning in reproducing kernel Kre˘ ın spaces (in preparation, early version available upon request) 4 / 4
Recommend
More recommend