large scale sparse kernel canonical correlation analysis
play

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio - PowerPoint PPT Presentation

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and Juho Rousu 1 1 Department of Computer Science, Aalto University Helsinki Institute for Information Technology HIIT 2 Indian Institute of Technology


  1. Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and Juho Rousu 1 1 Department of Computer Science, Aalto University Helsinki Institute for Information Technology HIIT 2 Indian Institute of Technology (IIT), Palakkad June 11, 2019 Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 1 / 5

  2. From large two-view datasets, it is not straightforward to identify which of the variables are related

  3. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � || Xu || 2 || Yv || 2

  4. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v

  5. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v

  6. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v

  7. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA

  8. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA

  9. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI

  10. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI � � ⊠ Deep CCA

  11. From large two-view datasets, it is not straightforward to identify which of the variables are related � Xu , Yv � → In standard CCA, we identify the related vari- || Xu || 2 || Yv || 2 ables from u and v → In the non-linear and/or large-scale variants, we cannot access the u and v Scalability u and v ⊠ ⊠ Kernel CCA � � ⊠ RF KCCA ⊠ � � KNOI � � ⊠ Deep CCA ⊠ � � SCCA-HSIC Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 2 / 5

  12. gradKCCA is a kernel matrix free method that efficiently optimizes u and v

  13. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1

  14. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v

  15. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent

  16. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u :

  17. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u

  18. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u )

  19. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u ) → Gradient step towards maximum: u grad = u + γ ∗ ∇ ρ u

  20. gradKCCA is a kernel matrix free method that efficiently optimizes u and v Let k x ( u ) = ( k x ( x i , u )) n i =1 and k y ( v ) = ( k y ( y i , v )) n i =1 k x ( u ) ⊤ k y ( v ) max ρ gradKCCA ( u , v ) = || k x ( u ) || 2 || k y ( v ) || 2 u , v s.t. || u || P x ≤ s u and || v || P y ≤ s v Maximum through alternating projected gradient ascent Optimization steps for u : → Compute the gradient ∇ ρ u = ∂ρ ( u , v ) ∂ u → Step-size using line search: max γ ρ ( u + γ ∇ ρ u ) → Gradient step towards maximum: u grad = u + γ ∗ ∇ ρ u → Project onto ℓ P ball: u = � � . � Px ≤ s x u grad Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 3 / 5

  21. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA

  22. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage SCCA-HSIC 0 F1 score AUC train test 1 0.8 0.6 0 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 Proportion of Noise Variables

  23. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage gradKCCA DCCA RCCA KNOI SCCA-HSIC SCCA-HSIC 0 F1 score AUC train test train test F1 score Time (s) 10 h 1 1 1 h 0.8 0.6 1 min 1 s 0 0.9 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 Proportion of Noise Variables Sample Size

  24. Experiments demonstrate noise tolerance, scalability, and superior speed of gradKCCA train test 1 DCCA 0.8 KNOI KCCA 0.6 gradKCCA KCCA preimage gradKCCA DCCA RCCA KNOI SCCA-HSIC SCCA-HSIC 0 F1 score AUC train test train test F1 score Time (s) 10 h 1 1 1 h 0.8 0.6 1 min 1 s 0 0.9 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 0.6 0.8 0.96 0.98 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 10 3 10 4 10 5 10 6 Proportion of Noise Variables Sample Size MediaMill ρ train ρ test Time (s) gradKCCA 0.666 ± 0.004 0.657 ± 0.007 8 ± 4 Deep CCA 0.643 ± 0.005 0.633 ± 0.003 1280 ± 112 RF KCCA 0.633 ± 0.001 0.626 ± 0.005 23 ± 9 KNOI 0.652 ± 0.001 0.645 ± 0.003 218 ± 73 SCCA-HSIC 0.627 ± 0.004 0.625 ± 0.002 1804 ± 143 Viivi Uurtio (Aalto, HIIT) ICML 2019 June 11, 2019 4 / 5

Recommend


More recommend