batchcrypt efficient homomorphic encryption for cross
play

BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo - PowerPoint PPT Presentation

BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning Chengliang Zhang , Suyi Li, Junzhe Xia, Wei Wang, Feng Yan, Yang Liu* Hong Kong University of Science and Technology University of Nevada, Reno


  1. BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning Chengliang Zhang †, Suyi Li†, Junzhe Xia†, Wei Wang†, Feng Yan‡, Yang Liu* †Hong Kong University of Science and Technology ‡University of Nevada, Reno * WeBank 1

  2. Federated Learning • Privacy concerns Data Silos § Data breaches • Government regulations § GDPR § CCPA Solution: Federated Learning Emerging challenge: Collaborative Machine Learning without Centralized Training Data [1] small & fragmented data [1] Bonawitz, Keith, et al. "Towards federated learning at scale: System design." arXiv preprint arXiv:1902.01046 (2019). 2

  3. Target Scenario: Cross-Silo Horizontal FL Cross-Silo: among organizations / institutions § o Banks, hospitals… o Reliable communication and computation o Strong privacy requirements o As opposed to cross-device: edge devices 3 Hospital A Hospital B Hospital C

  4. Target Scenario: Cross-Silo Horizontal FL Horizontal: datasets share same feature space [2] § Objective: train a model together without revealing private data to third § party (aggregator) and each other [2] Yang, Qiang, et al. "Federated machine learning: Concept and applications." ACM Transactions on Intelligent Systems 4 and Technology (TIST) 10.2 (2019): 1-19.

  5. Repurpose datacenter distributed training? Gradients are not safe to share in plaintext [3] [3] Aono, Yoshinori, et al. "Privacy-preserving deep learning via additively homomorphic encryption." IEEE Transactions 5 on Information Forensics and Security 13.5 (2017): 1333-1345.

  6. Federated Learning Approaches Method Differential Secure Multi Secure Homomorphic Privacy Party Comput. Aggregation [7] Encryption 🚬 [6] 🚬 🚬 Efficiency 🚬 [4] 🚬 Strong Privacy 🚬 [5] No accuracy loss [6] Du, Wenliang, Yunghsiang S. Han, and Shigang Chen. “Privacy-preserving [4] Gehrke, Johannes, Edward Lui, and Rafael Pass. "Towards privacy for multivariate statistical analysis: Linear regression and classification.” SDM social networks: A zero-knowledge based definition of privacy." TCC 2011. 2004. [5] Bagdasaryan, Eugene, Omid Poursaeed, and Vitaly Shmatikov. 6 [7] Bonawitz, Keith, et al. “Practical secure aggregation for privacy-preserving "Differential privacy has disparate impact on model accuracy." NIPS. 2019. machine learning.” CCS 2017.

  7. Additively Homomorphic Encryption for FL • Allow computation over ciphertexts Single Client decrypt(encrypt(a) + encrypt(b)) = a + b HE Public Key Aggregator Gradients Aggregated � Aggregation HE Private Key • Enables oblivious aggregation Gradients Client A � Encryption � Decryption … Client N 1. Clients produce gradients Client B � Encryption � Decryption 2. Encrypt gradients and upload them to Aggregator � Gradient � Model 3. Aggregator summarizes all gradient ciphertexts � Gradient � Model computation update computation update 4. Clients receive aggregated gradients [8] 5. Clients decrypt and apply model update [8] Aono, Yoshinori, et al. "Privacy-preserving deep learning via additively homomorphic encryption." IEEE Transactions 7 on Information Forensics and Security 13.5 (2017): 1333-1345.

  8. Characterization: FL with HE Why is HE expensive: • Computation • Communication Plaintext: 32bit -> ciphertext: 2000+ bit • Key Plaintext Ciphertext Encryption Decryption Size 1024 6.87MB 287.64MB 216.87s 68.63s 2048 6.87MB 527.17MB 1152.98s 357.17s 3072 6.87MB 754.62MB 3111.14s 993.80 Time breakdown of one iteration Paillier HE Run on FATE, models are F MNIST, C IFAR10, and L STM 8

  9. Potential Solutions Challenge: Maintain HE’s additively property • Accelerate HE operations o Limited parallelism: 3X with FPGA [9] Decrypting the sum of 2 batched ciphertexts o Communication stays the same = Adding pairs separately • Reduce encryption operations o One operation multiple data -0.3 0 2.6 -1.1 o “batching” gradient values + o Compact plaintext, less inflation 1.2 0.33 -4.2 -0.2 plaintext: 2000 bit -> ciphertext 2000bit = 0.9 0.33 -1.6 -1.3 [9] San, Ismail, et al. "Efficient paillier cryptoprocessor for privacy-preserving data mining." Security and communication 9 networks 9.11 (2016): 1535-1546..

  10. Gradient Batching is non-trivial All ciphertexts at aggregator: no differentiation , no permutation , no shifting Only bit-wise additions on underlying plaintexts sign exponent mantissa 1 01111111 00011001100110011001101 Not addable 1 01111100 10011001100110011001101 Gradients are floating numbers: exponent aligning is required for addition [9] [9] San, Ismail, et al. "Efficient paillier cryptoprocessor for privacy-preserving data mining." Security and communication 10 networks 9.11 (2016): 1535-1546..

  11. Quantization for Batching Batching with generic quantization Floating gradient values … 0111 1110 1000 0001 cannot be batched -> original 0.0079 129 -0.0079 126 + quantization value … 0000 0001 0111 1000 quantized -0.9921 -0.0551 1 120 value … A generic quantization method maps [-1, 1] 0111 1111 1111 1001 = To [0, 255] -1 127 249 -0.0475 Quantization: 255 * ( -0.0079 - -1) / (1 - -1) = 126 Dequantization: 127 * (1 - -1) / 255 + 2 * (-1) = -1 Limitations Restrictive: client # is required • Overflow easily: all positive integers • • No differentiation between positive and negative overflows 11

  12. Our Quantization & Batching Solution Desired quantization for aggregation • Flexible § Aggregation results are unbatchable only with ciphertexts alone • Overflow-aware § If overflow happens, we can tell the sign 12

  13. Our Quantization & Batching Solution Customized quantization for aggregation • Distinguish overflow r bit value z bit padding § Signed integer original quantized sign bit value value • Positive and negative cancel out each ot her § Symmetric range … 00 11 111 1111 00 000 0001 00 00 § Uniform quantization + -1 +1 0.0079 -0.0079 00 … 00 11 000 0010 00 11 111 1001 -0.0551 -126 -0.9921 -7 [-1, 1] is mapped to [-127, 127] = 00 … 01 11 000 0001 00 11 111 1010 -127 -0.0475 -6 -1 BatchCrypt 13

  14. Our Quantization & Batching Solution Customized quantization for aggregation • Signed integer r bit value z bit padding • Symmetric range original quantized • Uniform quantization sign bit value value Challenges: … 00 11 111 1111 00 000 0001 00 00 1. Differentiate overflows: + -1 +1 0.0079 -0.0079 two sign bits 00 … 00 11 000 0010 00 11 111 1001 2. Distinguish sign bits from value bits: -0.0551 -126 -0.9921 -7 two’s compliment coding = 00 … 01 11 000 0001 00 11 111 1010 -127 -0.0475 -6 -1 3. Tolerate overflowing: BatchCrypt padding zeros in between 14

  15. Gradient Clipping Gradients are unbounded Quantization range is bounded Clipping is required Tradeoff: 😁 Higher resolution within |ɑ| Smaller ɑ More diminished range information ☹ 15

  16. Gradient Clipping Gradients are unbounded quantization range is bounded Clipping is required q Profiling quantization loss with a sample dataset [10] FL has non-iid data • Gradients range diminishes during training: optimal shifts • q Analytical clipping with an online model Model the noises with distribution fitting • Flexible & adaptable • [10] http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf 16

  17. dACIQ: Analytical Gradient Clipping • Gradients distribu^on is bell-shaped: Gaussian like • Conven^onal gaussian fibng: MLE , BI ü Requires a lot of informaVon ü ComputaVonally intensive • dACIQ proposes a Gaussian Fibng method for distributed dataset o Only requires max , min , and size o ComputaVonally efficient: online o Stochas5c Rounding [11] o Layer-wise quanVzaVon [11] Banner, Ron, Yury Nahshan, and Daniel Soudry. "Post training 4-bit quantization of convolutional networks for rapid- 17 deployment." Advances in Neural Information Processing Systems. 2019.

  18. Introducing BatchCrypt Client Worker BatchCrypt BatchCrypt dACIQ Quantizer Dist. Fitting Advance Scaler Clipping Quantize / Dequantize Built atop FATE v1.1 2’s Comp. Codec Batch Mgr. • Batch / Unbatch Support TensorFlow, MXNet, and extendable to Encode / Decode • other frameworks Numba Parallel Joblib Parallel Implemented in Python • FATE ML backend Utilize Joblib, Numba for maximum parallelism • HE Mgr. Comm. Mgr. TensorFlow Initializer Remote MXNet Encrypt Get … 18

  19. Evaluations Setup Test Models Model Type Network Weights FMNIST Image Classification 3-layer-FC 101.77K CIFAR Image Classification AlexNet 1.25M LSTM-ptb Text Generation LSTM 4.02M Test Bed Region US W. Tokyo US E. London HK AWS o o Cluster of 10, spanning 5 locations Up (Mbps) 9841 116 165 97 81 o C5.4xlarge instances (16 vCPUs, 32 GB memory) Down (Mbps) 9842 122 151 84 84 Bandwidth from clients to aggregator 19

  20. BatchCrypt’s Quantization Quality - Negligible loss - Quantization sometimes outperforms plain: randomness adds regularization FMNIST CIFAR LSTM test accuracy test accuracy loss 20

  21. BatchCrypt’s Effectiveness: Computation Iteration time breakdown of LSTM - Compared with stock FATE - Batch size set to 100 - 16 bit quantization - 23.3X for FMNIST - 70.8X for CIFAR - 92.8X for LSTM client aggregator Larger the model, beier the results 21

Recommend


More recommend