parsecureml an efficient parallel secure machine learning
play

ParSecureML: An Efficient Parallel Secure Machine Learning - PowerPoint PPT Presentation

49th International Conference on Parallel Processing - ICPP ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du Renmin


  1. 49th International Conference on Parallel Processing - ICPP ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou ★ , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du ⋄ Renmin University of China ⋆ ShenZhen Uuiversity +Tsinghua University 1

  2. Outline 1. Background 2. Motivation 3. Basic Idea 4. Challenges 5. ParSecureML 6. Evaluation 7. Source Code at Github 8. Conclusion 2/33

  3. 1. Background • Secure Machine Learning … … (b) Machine learning process with two-party (a) Typical machine learning process. computation. 3/33

  4. 1. Background • Secure Machine Learning … … 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 4/33

  5. 1. Background • Secure Machine Learning … … 2 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 5/33

  6. 1. Background • Secure Machine Learning 3 … … 2 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 6/33

  7. 1. Background • Secure Machine Learning 3 … … 2 4 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 7/33

  8. 1. Background • GPU Acceleration https://developer.nvidia.com/deep-learning 8/33

  9. 2. Motivation • Performance Degradation origin SecureML 3 Normalized performance 2.5 2 1.5 1 0.5 0 Linear Logistic MLP Convolution regression regression neural network 9/33

  10. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 10/33

  11. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 11/33

  12. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 12/33

  13. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 13/33

  14. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 14/33

  15. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 15/33

  16. 2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 16/33

  17. 3. Basic Idea • A GPU-based two-party computation that considers both the GPU characteristics and features of two-party computation shall have better performance acceleration effects. • Challenges • Challenge 1: Complex triplet multiplication based computation patterns • Challenge 2: Frequent intra-node data transmission between CPU and GPU • Challenge 3: Complicated inter-node data dependence 17/33

  18. 4. Challenges • Challenge 1: Complex triplet multiplication based computation patterns … … 18/33

  19. 4. Challenges • Challenge 2: Frequent intra-node data transmission between CPU and GPU server1 server2 data data data data Step n Step n CPU GPU CPU GPU Step Step data data data data n+1 n+1 … … 19/33

  20. 4. Challenges • Challenge 3: Complicated inter-node data dependence server1 server2 data data data data Step n Step n CPU GPU CPU GPU Step Step data data data data n+1 n+1 … … 20/33

  21. 5. ParSecureML • Overview - ParSecureML consists of three major components: • Profiling-guided adaptive GPU utilization • Intra-node double pipeline • Inter-node compressed transmission communication • offline online … client pipeline execution among different layers server1 encrypted layer2 data reconstruct GPU operation reconstruct GPU operation … forward forward backward backward layer 1 GPU-based compute data input compressed communication final result data distribution … server2 reconstruct GPU operation reconstruct GPU operation encrypted layer2 … forward forward backward backward data layer 1 pipeline execution among different layers 21/33

  22. 5. ParSecureML • Profiling-Guided Adaptive GPU Utilization Online acceleration design Offline acceleration design 𝐷𝑄𝑉: 𝐹 𝑗 = 𝐹 0 + 𝐹 1 𝐺 = 𝐺 0 + 𝐺 1 𝐻𝑄𝑉: 𝐷𝑗 = (−𝑗) × 𝐹 × 𝐺 + 𝐵𝑗 × 𝐺 + 𝐹 × 𝐶𝑗 + 𝑎𝑗 22/33

  23. 5. ParSecureML • Double Pipeline for Intra-Node CPU-GPU Fine-Grained Cooperation • Pipeline 1 to overlap PCIe data transmission and GPU computation. GPU computation: (-i) E+Ai D E data transmission: E Ai F Bi time 23/33

  24. 5. ParSecureML • Double Pipeline for Intra-Node CPU-GPU Fine-Grained Cooperation • Pipeline 1 to overlap PCIe data transmission and GPU computation. GPU computation: (-i) E+Ai D E data transmission: E Ai F Bi time • Pipeline 2 to overlap operations reconstruct operation reconstruct operation layer n (forward) (forward) (backward) (backward) … … reconstruct operation reconstruct operation layer 2 (forward) (forward) (backward) (backward) reconstruct operation reconstruct operation layer 1 (backward) (forward) (forward) (backward) time

  25. 5. ParSecureML • Compressed Transmission for Inter-Node Communication server1 server2 𝑗 𝑗 E 0 E 1 𝐵 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝑗𝑘 𝐶 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝑗𝑘 F 0 F 1 Δ A Δ A server1 server2 CSR ΔA Δ A CSR ΔA Δ A no no yes Is Sparse? yes Is Sparse? CSR ΔB CSR ΔB Δ B Δ B no no Δ B Δ B 25/33

  26. 6. Evaluation • Baseline: SecureML[1] • Benchmarks • Convolution neural network (CNN) • Multilayer Perceptron (MLP). • Recurrent neural network (RNN) • Linear regression • Logistic regression • Datasets - VGGFace2/NIST/SYNTHETIC/MNIST • HPC Cluster • Intel(R) Xeon(R) CPU E5-2670 v3 • Nvidia Tesla V100 [1] Mohassel P, Zhang Y. Secureml: A system for scalable privacy preserving machine learning[C]//2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017: 19-38

  27. 6. Evaluation • Overall speedups. On average, ParSecureML achieves an average speedup of 32.2x over the SecureML. Overall performance 100 Speedup 10 1 VGGFace2 NIST SYNTHETIC MNIST 27/33 CNN MLP Linear Logistic RNN

  28. 6. Evaluation • Online speedups. The average online performance speedup is 61.4x (even higher than the overall speedup). Online performance 100 Speedup 10 1 VGGFace2 NIST SYNTHETIC MNIST 28/33 CNN MLP Linear Logistic RNN

  29. 6. Evaluation • Offline speedups. Applying GPUs in the offline phase brings 1.2x performance benefits. offline performance 3 2.5 2 speedup 1.5 1 0.5 0 VGGFace2 NIST SYNTHETIC MNIST CNN MLP Linear Logistic RNN 29/33

  30. 6. Evaluation • Communication benefits - On average, ParSecureML reduces 23.7% communication overhead. Communication benefits 60 50 Improvement(%) 40 30 20 10 0 VGGFace2 NIST SYNTHETIC MNIST 30/33 CNN MLP Linear Logistic RNN

  31. 6. Evaluation • Influence of workload size workload SecureML ParSecureML 2500 2000 time(s) 1500 1000 500 0 workload size(MB) 31/33

  32. 6. Source Code at Github • https://github.com/ZhengChenCS/ParSecureML 32/33

  33. 7. Conclusion • We exhibit our observations and insights in SecureML acceleration. • We develop ParSecureML, the first parallel secure machine learning framework on GPUs. • We demonstrate the benefits of ParSecureML over the state-of-the-art secure machine learning framework. 33/34

  34. Thank you! • Any questions? ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou ★ , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du ⋄ Renmin University of China ⋆ ShenZhen Uuiversity +Tsinghua University chenzheng123@ruc.edu.cn, fengzhang@ruc.edu.cn, chi.zhou@szu.edu.cn, zhaijidong@tsinghua.edu.cn, chenyangzhang@ruc.edu.cn, duyong@ruc.edu.cn https://github.com/ZhengChenCS/ParSecureML 34/34

Recommend


More recommend