Machine Learning & Data confidentiality Melek Önen Joint work with Alberto Ibarrondo, Beyza Bozdemir, Mohamad Mansouri, Gamze Tillem, Orhan Ermis
Machine Learning as a Service Client Server Performance No need for ML knowledge Cost reduction 2
Sensitive and confidential data Client Server • Sensitive personal data • Intellectual property • Corporate data (IP) • Legal restrictions 3
Data breaches in 2019 Average Cost Global: 3.92M$, Per record: 150$ Top 3 sectors Health, Financial, Services Factors increasing cost Extensive migration to cloud Third party involvement Compliance failures Factors decreasing cost Extensive use of encryption Use of security analytics 4
GDPR Effect GDPR Effective in May 2018 Fines ~20 million euros or 4% of turnover GDPR Fines in Year One Global: 56M€ Leading watchdog: CNIL (France) 5
PAPAYA ? Single Source setting (Arrhythmia Detection, Mobility Analytics) Multiple Sources setting (Stress Management, Mobile Usage Analytics, Threat Detection) Third Party Querier (Mobile Usage Analytics) Data analytics: basic statistics, clustering, NN classification, NN training 6
Homomorphic encryption 𝑭𝒐𝒅𝒔𝒛𝒒𝒖 𝒏 𝟐 𝒑𝒒𝟐 𝑭𝒐𝒅𝒔𝒛𝒒𝒖 𝒏 𝟑 = 𝑭𝒐𝒅𝒔𝒛𝒒𝒖 𝒏 𝟐 𝒑𝒒𝟑 𝒏 𝟑 Partially HE Support one operation only Somewhat HE Support arbitrary + and limited number of x Fully HE Support any function 7
Secure Two-party computation y x Compute f(x,y) leak no other information than what Ideal model leaks Yao’s GC Arithmetic sharing Boolean sharing 8
HE vs. 2PC 2PC HE Non-interactive Interactive - Client is involved Only linear operations Linear and nonlinear operations Expensive in computation cost Efficient in computation cost No communication cost Expensive in communication cost 9
Artificial Neural Networks Supervised machine learning technique Two phases: Training Classification NN layers: Activation layer Pooling layer Fully-connected layer Convolution layer (optional) 10
Neural Networks - Architecture 11
Privacy preserving NN Classification Use Advanced cryptographic techniques Homomorphic encryption, Secure 2PC Challenge : Privacy vs. Performance Additional overhead (Computation, memory & bandwidth) Complex operations (sigmoid, tanh, etc.) Real numbers (vs. integers with PETs) Goal Reduce NN complexity Approximate complex operations Use low degree polynomials Approximate real numbers Use integers 12
Privacy preserving NN Classification Use Advanced cryptographic techniques Homomorphic encryption, Secure 2PC Challenge : Privacy vs. Performance Additional overhead (Computation, memory & bandwidth) Complex operations (sigmoid, tanh, etc.) Real numbers (vs. integers with PETs) Goal Reduce NN complexity Approximate complex operations Use low degree polynomials Approximate real numbers Use integers 13
Approximation of NN layers • Convolution layer • Matrix multiplications No need for approximation • Activation layer Most common approach: 𝒚 𝟑 and ReLU • • Pooling layer Sum or average • • Fully Connected layer Matrix multiplications No need for approximation • • Real numbers Most common approach: Multiplying with 10 𝑜 • 14
Privacy preserving NN classification Hybrid solutions FHE-based solutions MPC-based solutions MiniONN CryptoNets SecureML Chabanne et al. DeepSecure Ibarrondo et al. Gazelle Chameleon Bourse et al. ABY 3 EzPC Swann CryptoDL … … SecureNN PAC … 15
FHE-Based Batch Normalization [DPM 2018] LHE-based pp NN Investigate Batch Normalization Trained Rescaling mean Shifting Trained Small constant variance BN Transformation Simplify operations (with equivalence) Absorb BN in previous FC or Conv layers 16
PAC: Pp Arrhythmia Classification [FPS 2019] NN based ECG analysis 2PC based NN classifier Low degree polynomials for activation functions Approximation of real numbers PCA for size reduction Performance results with PhysioBank 96.34% accuracy 1 sec prediction time in real environment PAC in batches Efficient solution for real scenarios 17
SwaNN: Pp Classification based on PHE+2PC [PUT 2019] Switches between PHE and 2PC Paillier for linear operations Interactive Paillier for 𝒚 𝟑 Two settings 18
Open questions Multi-user privacy preserving NN classification • Multi-source, multi-querier, etc. • Privacy preserving NN training • Privacy preserving clustering • 19
Thank you! melek.onen@eurecom.fr
Recommend
More recommend