Incremental Randomized Sketching for Online Kernel Learning Shizhong Liao ∗ Xiao Zhang College of Intelligence and Computing, Tianjin University szliao@tju.edu.cn June 13, 2019 Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 1 / 11
Outline Introduction 1 Main Results 2 Conclusion 3 Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 2 / 11
Introduction New Challenges of Online Kernel Learning (1) High computational complexities Per-round time complexity depending on T [Calandriello et al., 2017b] Linear space complexity [Calandriello et al., 2017a] (2) Lack of theoretical guarantees Lack of sublinear regrets for randomized sketching [Wang et al., 2016] Lack of constant lower bounds on budget/sketch size [Lu et al., 2016] Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 3 / 11
Introduction Main Contribution Table 1: Comparison with existing online kernel learning approaches (1st order: existing first-order approaches; 2nd order: existing second-order approaches) Computational complexities Theoretical guarantees Time (per round) Space Budget/Sketch size Regret 1st order Constant Constant Linear Sublinear 2nd order Sublinear Linear Logarithmic Sublinear Proposed Constant Constant Constant Sublinear Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 4 / 11
Main Results Incremental Randomized Sketching Approach Sequence of Instances Matrix Sketch Sketching Updating Incremental Randomized Sketching Explicit Gradient Mapping Descent Hypothesis Updating Online Prediction Figure 1: Novel incremental randomized sketching scheme for online kernel learning Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 5 / 11
Main Results Incremental Randomized Sketching Approach t + t + ( 1) ( 1) C K m ( ) ( ) † † s t + Φ t + Φ Φ t + x m ( 1 ) ( 1 ) ( 1 ) t + i i = pp ( 1) 1 pm pm C t + m ψ ( ) ( 1) t + K t 2 ( ) t + ( 1) F Q t + sk 1 t + + ψ ( 1) ( t 1) ( ) ( +1) t S ( +1) t t + S ψ ( 1) p m t + Φ Φ ( 1 ) ( t ) + ( t 1) pm pm t + Δ ( 1) = + , , ( ) t , ( ) t S S pm p m t + t + ( 1) ( 1) ( s ) s p m ( +1) t ψ t + S ( 1) p t + Φ ( 1 ) Φ ( t ) + pp pp ( t 1) t + Δ ( 1) = + , , ( ) t S pp a t + ( 1) s p Figure 2: The proposed incremental randomized sketching for kernel matrix approximation at round t + 1 Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 6 / 11
Main Results Incremental Randomized Sketching Theory Product preserving property: Statistically unbiased. Inner Product Preserving Property Approximation property: ( 1 + ǫ ) -relative error bound. Matrix Product Preserving Property Regret bound: √ O ( T ) regret bound, Low-Rank Approximation Property constant lower bounds of sketch sizes. Regret Bound Figure 3: The dependence structure of our theoretical results. Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 7 / 11
Main Results Experimental Results Table 2: Comparison of online kernel learning algorithms in adversarial environments german-1 german-2 Algorithm Mistake rate Time Mistake rate Time FOGD 37.493 ± 0.724 0.140 32.433 ± 0.196 0.265 NOGD 30.918 ± 0.003 0.405 26.737 ± 0.002 0.778 PROS-N-KONS 27.633 ± 0.416 33.984 17.737 ± 0.900 98.873 SkeGD ( θ = 0 . 1) 17.320 ± 0.136 0.329 7.865 ± 0.059 0.597 SkeGD ( θ = 0 . 01) 17.272 ± 0.112 0.402 7.407 ± 0.086 0.633 SkeGD ( θ = 0 . 005) 16.578 ± 0.360 0.484 7.266 ± 0.065 0.672 SkeGD ( θ = 0 . 001) 16.687 ± 0.155 1.183 6.835 ± 0.136 1.856 Our incremental randomized sketching achieves a better learning performance in terms of accuracy and efficiency even in adversarial environments. Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 8 / 11
Conclusion Novel incremental randomized sketching for online kernel learning. Meet the new challenges of online kernel learning. (1) ( 1 + ǫ ) -relative error bound. (2) Sublinear regret bound under constant lower bounds of the sketch size. (3) Constant per-round computational complexities. A sketch scheme for both online and offline large-scale kernel learning. Xiao Zhang Shizhong Liao (TJU) ICML 2019 June 13, 2019 9 / 11
Main References [Calandriello et al., 2017a] Calandriello, D., Lazaric, A., and Valko, M. (2017a). Efficient second-order online kernel learning with adaptive embedding. In Advances in Neural Information Processing Systems 30 , pages 6140–6150. [Calandriello et al., 2017b] Calandriello, D., Lazaric, A., and Valko, M. (2017b). Second-order kernel online convex optimization with adaptive sketching. In Proceedings of the 34th International Conference on Machine Learning , pages 645–653. [Lu et al., 2016] Lu, J., Hoi, S. C., Wang, J., Zhao, P., and Liu, Z. (2016). Large scale online kernel learning. Journal of Machine Learning Research , 17:1613–1655. [Wang et al., 2016] Wang, S., Zhang, Z., and Zhang, T. (2016). Towards more efficient SPSD matrix approximation and CUR matrix decomposition. Journal of Machine Learning Research , 17:1–49.
Thank you!
Recommend
More recommend