skoltech
play

Skoltech Skolkovo Institute of Science and Technology Kernel - PowerPoint PPT Presentation

Quadrature-based Features for Kernel Approximation Marina Munkhoeva , Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets Skoltech Skolkovo Institute of Science and Technology Kernel Methods Refresher Kernel trick: compute


  1. Quadrature-based Features for Kernel Approximation Marina Munkhoeva , Yermek Kapushev, Evgeny Burnaev, Ivan Oseledets Skoltech Skolkovo Institute of Science and Technology

  2. Kernel Methods Refresher • Kernel trick: compute via kernel function K ( x , z ) = ⟨ ψ ( x ), ψ ( z ) ⟩ k ( x , z ) • Inner product in an implicit space using input features • Naively, kernel methods scale poorly with # of samples ψ Input space Feature space 1/9

  3. Scalable Kernel Methods • Revert the trick: k ( x , z ) ≈ ϕ ( x ) ⊤ ϕ ( z ) • Use linear methods with mapped objects x → ϕ ( x ) • How to generate approximate mapping ? ϕ ( ⋅ ) ψ Input space Feature space k ( x , y ) = ⟨ ψ ( x ), ψ ( y ) ⟩ ≈ ϕ ( x ) ⊤ ϕ ( y ) 2/9

  4. Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 3/9

  5. Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), p ( w ) = (2 π ) − d /2 e − ∥ w ∥ 2 f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 2 3/9

  6. Kernel Function Approximation Consider kernels that allow integral representation: k ( x , y ) = 𝔽 p ( w ) f xy ( w ) = ∫ ℝ d f xy ( w ) p ( w ) d w = I ( f ), p ( w ) = (2 π ) − d /2 e − ∥ w ∥ 2 f xy ( w ) = ϕ ( w ⊤ x ) ϕ ( w ⊤ y ) = f ( w ), 2 • Shift-invariant kernels (e.g. radial basis functions (RBF) kernel) • Pointwise Nonlinear Gaussian kernels (e.g. arc-cosine kernels) 3/9

  7. Random Fourier Features (RFF) [Rahimi and Recht, 2008] RFF mapping : ϕ ( ⋅ ) k ( x , z ) = 𝔽 [ ϕ w ( x ) ϕ w ( z )] ϕ w ( x ) = [ cos( w ⊤ x ), sin( w ⊤ x ) ] , w ∼ p ( w ) RFF Monte Carlo approximation for I ( f ) • Orthogonal points more accurate w • Structured faster w • Orthogonal + structured more accurate and faster w 4/9

  8. Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ 5/9

  9. Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∞ ∫ e − r 2 Integration over radius : 2 | r | d − 1 h ( r ) dr r −∞ 5/9

  10. ̂ Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∞ ∫ e − r 2 Integration over radius : 2 | r | d − 1 h ( r ) dr r −∞ l h ( ρ i ) + h ( − ρ i ) Use radial rules ∑ R ( h ) = w i 2 i =0 5/9

  11. Our method uses polar form of the integral Change to polar coordinates ( ) w = r z , ∥ z ∥ 2 = 1 2 ∫ ℝ d 2 f ( w ) d w = (2 π ) − d ∞ ∫ U d ∫ e − ∥ w ∥ 2 2 e − r 2 2 | r | d − 1 f ( r z ) dr I ( f ) = (2 π ) − d d z 2 −∞ ∫ U d Integration over unit d-sphere : U d s ( z ) d z p Use spherical rules ∑ ˜ w j s ( Qz j ) S Q ( s ) = j =1 5/9

  12. Quadrature-based Features [Genz and Monahan, 1998] introduced Spherical-Radial (SR) rules Q , ρ ( f xy ) = ( 1 − d j =1 [ ] f xy ( − ρ Qv j ) + f xy ( ρ Qv j ) d +1 ρ 2 ) f xy ( 0 ) + d ∑ SR 3,3 2 ρ 2 d + 1 We propose to estimate the integral by SR rules n I ( f xy ) = 1 ∑ Q , ρ ( f xy )] ≈ ̂ I ( f xy ) = 𝔽 Q , ρ [ SR 3,3 SR 3,3 Q i , ρ i ( f xy ) n i =1 sample complexity with constant smaller than RFF 𝒫 ( ε − 2 ) 6/9

  13. Our method generalizes RFF and ORF RFF are SR rules of degree (1, 1) Q , ρ = f ( ρ Qz ) + f ( − ρ Qz ) SR (1,1) SR (1,1) ρ ∼ χ ( d ), ρ Qz ∼ 𝒪 (0, I ) ⟹ w ∼ 𝒪 (0, I ) , Q , ρ = f ( w ), 2 7/9

  14. Our method generalizes RFF and ORF RFF are SR rules of degree (1, 1) Q , ρ = f ( ρ Qz ) + f ( − ρ Qz ) SR (1,1) SR (1,1) ρ ∼ χ ( d ), ρ Qz ∼ 𝒪 (0, I ) ⟹ w ∼ 𝒪 (0, I ) , Q , ρ = f ( w ), 2 Orthogonal Random Features (ORF) are SR rules of degree (1, 3) d f ( ρ Qe i ) + f ( − ρ Qe i ) ∑ SR (1,3) ρ ∼ χ ( d ) Q , ρ = , 2 i =1 7/9

  15. Faster mapping with orthogonal Q Use orthogonal butterfly matrices with structured factors c 1 − s 1 − s 2 0 0 c 2 0 0 − s 2 s 1 c 1 0 0 0 c 2 0 B (4) = − s 3 0 0 c 3 s 2 0 c 2 0 0 0 s 3 c 3 0 s 2 0 c 2 − s 1 c 2 − c 1 s 2 c 1 c 2 s 1 s 2 − s 1 s 2 − c 1 s 2 s 1 c 2 c 1 c 2 = − s 3 s 2 − s 3 c 2 c 3 s 2 c 3 c 2 s 3 s 2 c 3 s 2 s 3 c 2 c 3 c 2 Allow fast matrix-vector multiplication ( ) 𝒫 ( n log n ) 8/9

  16. Kernel Approximation Accuracy (ours - B) Powerplant LETTER USPS MNIST CIFAR100 LEUKEMIA × 10 − 1 × 10 − 1 × 10 − 2 × 10 − 2 × 10 − 2 × 10 − 2 6 1 . 4 1 . 8 Arc-cosine 0 4 . 8 3 . 6 1 . 8 1 . 2 5 1 . 5 4 . 0 3 . 0 K k 1 . 5 1 . 0 k K � ˆ k K k 4 1 . 2 3 . 2 2 . 4 1 . 2 0 . 8 0 . 9 3 2 . 4 0 . 9 1 . 8 0 . 6 0 . 6 2 0 . 6 1 . 6 1 . 2 0 . 4 0 . 3 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 × 10 − 1 5 × 10 − 1 × 10 − 1 × 10 − 2 × 10 − 2 × 10 − 2 G 7 . 5 3 . 0 1 . 0 Arc-cosine 1 3 . 0 6 . 0 4 Gort 6 . 0 2 . 4 0 . 8 2 . 4 K k 4 . 5 ROM 3 k K � ˆ 4 . 5 k K k 1 . 8 1 . 8 0 . 6 QMC 2 3 . 0 1 . 2 3 . 0 0 . 4 1 . 2 GQ 1 0 . 6 1 . 5 1 . 5 0 . 2 0 . 6 B 0 0 . 0 0 . 0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 × 10 − 2 × 10 − 2 × 10 − 2 × 10 − 3 × 10 − 3 × 10 − 4 5 1 . 25 3 . 0 2 . 5 4 . 0 7 . 5 Gaussian 4 1 . 00 2 . 5 3 . 2 2 . 0 6 . 0 K k k K � ˆ 3 k K k 2 . 0 0 . 75 2 . 4 1 . 5 4 . 5 2 1 . 5 0 . 50 1 . 6 1 . 0 3 . 0 1 . 0 0 . 8 0 . 25 1 0 . 5 1 . 5 0 . 5 0 . 0 0 . 00 0 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 n n n n n n 9/9

  17. Summary Our method quadrature-based features • applicable to a wide range of kernels • achieves higher accuracy • uses structured matrices • generalizes previous work Poster #130

Recommend


More recommend