49th International Conference on Parallel Processing - ICPP ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou ★ , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du ⋄ Renmin University of China ⋆ ShenZhen Uuiversity +Tsinghua University 1
Outline 1. Background 2. Motivation 3. Basic Idea 4. Challenges 5. ParSecureML 6. Evaluation 7. Source Code at Github 8. Conclusion 2/33
1. Background • Secure Machine Learning … … (b) Machine learning process with two-party (a) Typical machine learning process. computation. 3/33
1. Background • Secure Machine Learning … … 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 4/33
1. Background • Secure Machine Learning … … 2 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 5/33
1. Background • Secure Machine Learning 3 … … 2 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 6/33
1. Background • Secure Machine Learning 3 … … 2 4 1 (b) Machine learning process with two-party (a) Typical machine learning process. computation. 7/33
1. Background • GPU Acceleration https://developer.nvidia.com/deep-learning 8/33
2. Motivation • Performance Degradation origin SecureML 3 Normalized performance 2.5 2 1.5 1 0.5 0 Linear Logistic MLP Convolution regression regression neural network 9/33
2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 10/33
2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 11/33
2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 12/33
2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 13/33
2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 14/33
2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 15/33
2. Motivation • Time Breakdown for two-party computation online offline client 0.21s server1 encrypted 0.19s 95.52s data compute1 communicate compute2 62.68s … compute data 0.11s input final result 0.24s distribution data 0.05s server2 encrypted data compute1 communicate compute2 … 16/33
3. Basic Idea • A GPU-based two-party computation that considers both the GPU characteristics and features of two-party computation shall have better performance acceleration effects. • Challenges • Challenge 1: Complex triplet multiplication based computation patterns • Challenge 2: Frequent intra-node data transmission between CPU and GPU • Challenge 3: Complicated inter-node data dependence 17/33
4. Challenges • Challenge 1: Complex triplet multiplication based computation patterns … … 18/33
4. Challenges • Challenge 2: Frequent intra-node data transmission between CPU and GPU server1 server2 data data data data Step n Step n CPU GPU CPU GPU Step Step data data data data n+1 n+1 … … 19/33
4. Challenges • Challenge 3: Complicated inter-node data dependence server1 server2 data data data data Step n Step n CPU GPU CPU GPU Step Step data data data data n+1 n+1 … … 20/33
5. ParSecureML • Overview - ParSecureML consists of three major components: • Profiling-guided adaptive GPU utilization • Intra-node double pipeline • Inter-node compressed transmission communication • offline online … client pipeline execution among different layers server1 encrypted layer2 data reconstruct GPU operation reconstruct GPU operation … forward forward backward backward layer 1 GPU-based compute data input compressed communication final result data distribution … server2 reconstruct GPU operation reconstruct GPU operation encrypted layer2 … forward forward backward backward data layer 1 pipeline execution among different layers 21/33
5. ParSecureML • Profiling-Guided Adaptive GPU Utilization Online acceleration design Offline acceleration design 𝐷𝑄𝑉: 𝐹 𝑗 = 𝐹 0 + 𝐹 1 𝐺 = 𝐺 0 + 𝐺 1 𝐻𝑄𝑉: 𝐷𝑗 = (−𝑗) × 𝐹 × 𝐺 + 𝐵𝑗 × 𝐺 + 𝐹 × 𝐶𝑗 + 𝑎𝑗 22/33
5. ParSecureML • Double Pipeline for Intra-Node CPU-GPU Fine-Grained Cooperation • Pipeline 1 to overlap PCIe data transmission and GPU computation. GPU computation: (-i) E+Ai D E data transmission: E Ai F Bi time 23/33
5. ParSecureML • Double Pipeline for Intra-Node CPU-GPU Fine-Grained Cooperation • Pipeline 1 to overlap PCIe data transmission and GPU computation. GPU computation: (-i) E+Ai D E data transmission: E Ai F Bi time • Pipeline 2 to overlap operations reconstruct operation reconstruct operation layer n (forward) (forward) (backward) (backward) … … reconstruct operation reconstruct operation layer 2 (forward) (forward) (backward) (backward) reconstruct operation reconstruct operation layer 1 (backward) (forward) (forward) (backward) time
5. ParSecureML • Compressed Transmission for Inter-Node Communication server1 server2 𝑗 𝑗 E 0 E 1 𝐵 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝑗𝑘 𝐶 𝑗, 𝑘 + 1 = 𝑗, 𝑘 𝑗𝑘 F 0 F 1 Δ A Δ A server1 server2 CSR ΔA Δ A CSR ΔA Δ A no no yes Is Sparse? yes Is Sparse? CSR ΔB CSR ΔB Δ B Δ B no no Δ B Δ B 25/33
6. Evaluation • Baseline: SecureML[1] • Benchmarks • Convolution neural network (CNN) • Multilayer Perceptron (MLP). • Recurrent neural network (RNN) • Linear regression • Logistic regression • Datasets - VGGFace2/NIST/SYNTHETIC/MNIST • HPC Cluster • Intel(R) Xeon(R) CPU E5-2670 v3 • Nvidia Tesla V100 [1] Mohassel P, Zhang Y. Secureml: A system for scalable privacy preserving machine learning[C]//2017 IEEE Symposium on Security and Privacy (SP). IEEE, 2017: 19-38
6. Evaluation • Overall speedups. On average, ParSecureML achieves an average speedup of 32.2x over the SecureML. Overall performance 100 Speedup 10 1 VGGFace2 NIST SYNTHETIC MNIST 27/33 CNN MLP Linear Logistic RNN
6. Evaluation • Online speedups. The average online performance speedup is 61.4x (even higher than the overall speedup). Online performance 100 Speedup 10 1 VGGFace2 NIST SYNTHETIC MNIST 28/33 CNN MLP Linear Logistic RNN
6. Evaluation • Offline speedups. Applying GPUs in the offline phase brings 1.2x performance benefits. offline performance 3 2.5 2 speedup 1.5 1 0.5 0 VGGFace2 NIST SYNTHETIC MNIST CNN MLP Linear Logistic RNN 29/33
6. Evaluation • Communication benefits - On average, ParSecureML reduces 23.7% communication overhead. Communication benefits 60 50 Improvement(%) 40 30 20 10 0 VGGFace2 NIST SYNTHETIC MNIST 30/33 CNN MLP Linear Logistic RNN
6. Evaluation • Influence of workload size workload SecureML ParSecureML 2500 2000 time(s) 1500 1000 500 0 workload size(MB) 31/33
6. Source Code at Github • https://github.com/ZhengChenCS/ParSecureML 32/33
7. Conclusion • We exhibit our observations and insights in SecureML acceleration. • We develop ParSecureML, the first parallel secure machine learning framework on GPUs. • We demonstrate the benefits of ParSecureML over the state-of-the-art secure machine learning framework. 33/34
Thank you! • Any questions? ParSecureML: An Efficient Parallel Secure Machine Learning Framework on GPUs Zheng Chen , Feng Zhang , Amelie Chi Zhou ★ , Jidong Zhai+, Chenyang Zhang , Xiaoyong Du ⋄ Renmin University of China ⋆ ShenZhen Uuiversity +Tsinghua University chenzheng123@ruc.edu.cn, fengzhang@ruc.edu.cn, chi.zhou@szu.edu.cn, zhaijidong@tsinghua.edu.cn, chenyangzhang@ruc.edu.cn, duyong@ruc.edu.cn https://github.com/ZhengChenCS/ParSecureML 34/34
Recommend
More recommend