Protecting the intellectual property of machine learning models: Prediction APIs, model extraction attacks and watermarking Samuel Marchal Aalto University & F-Secure Corporation samuel.marchal@f-secure.fi (joint work with N. Asokan, Buse Atli, Mika Juuti , Sebastian Szyller)
Supervised machine learning Machine learning model ) f( • a function: f(x) = y = car • x: pictures → y: word label Large number of possible inputs x Finite number of classes y • Cat vs dog • Car / truck / plane / boat 2
1. Model training: Learn function f() : boat : boat function f() : boat = ML 𝑈𝑠𝑏𝑗𝑜𝑓𝑠 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 model : car : car Analyst : car 3
2. Prediction: Use function f() function f() = Car → prediction ML model Client 4
Protecting the intellectual property of machine learning models: Prediction APIs, model extraction attacks and watermarking Samuel Marchal Aalto University & F-Secure Corporation samuel.marchal@f-secure.fi (joint work with Buse Atli, Sebastian Szyller, Mika Juuti, N. Asokan)
Outline Is intellectual property of ML models important? Can intellectual property of ML models be compromised? • Model extraction attacks What can be done to counter model extraction? • Prevent model extraction • Detect model extraction • Deter model extraction (watermarking) 6
Is intellectual property of ML model important? Machine learning models: business advantage to model owner Cost of • gathering relevant data • labeling data • expertise required to choose the right model + training method (e.g., hyperparamaters) • resources expended in training : boat : boat : boat ML 𝑈𝑠𝑏𝑗𝑜𝑓𝑠 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 model : car : car 7 : car Analyst
How to prevent model theft? White box model theft can be countered by making model confidential • Computation with encrypted models • Protecting models using secure hardware • Hosting models behind a firewalled cloud service Basic idea: hide the model itself, expose model functionality only via a prediction API Is that enough to prevent model theft? 8 8
Model extraction attacks
Extracting models via their prediction APIs Prediction APIs are oracles that leak information Adversary • Malicious client • Goal: rebuild a surrogate model for a victim model Client • Capability: access to prediction API or model outputs Prediction Victim ML API Model model Extraction attack success Client • Surrogate has similar performance as victim model Surrogate Model 10
Generic model extraction attack 1. Initial data collection -- unlabeled seed training samples 2. Select model hyperparameters -- model, hyperparameters, layers, ... 3. Query API for predictions -- get labels / classification probabilities 4. Train surrogate model -- update model Client Prediction Victim ML API model Model Surrogate Model 11
1. Initial data collection (unlabeled) Synthetic (semi-random) data only [1] • Steal logistic regression models, decision trees Limited natural data + synthetic data [2,3] • Steal simple CNN models Unlimited natural data [4] • Steal complex (deep) CNN models [1] Tramèr et al. - Stealing Machine Learning Models via Prediction APIs . USENIX’16 (https://arxiv.org/abs/1609.02943) [2] Papernot et al.- Practical Black-Box Attacks against Machine Learning . ASIA CCS ‘17 (https://arxiv.org/abs/1602.02697) 12 12 [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks . EuroS&P’19 (https://arxiv.org/abs/1805.02628) [4] Orekondy et al. - Knockoff Nets: Stealing Functionality of Black-Box Models. CVPR’19 (https://arxiv.org/abs/1812.02766)
2. Select model hyperparameters Use pre-defined model hyperparameters • Fixed irrespective of victim model [4] • Same as victim model [2] Optimize training hyperparameters [3] • According to initial data and victim model to steal • Cross-validation search using initial data labelled by victim model Hyperparameters optimized for model extraction attacks increase attack effectiveness → surrogate model accuracy [2] Papernot et al. - Practical Black-Box Attacks against Machine Learning . ASIA CCS ‘17 (https://arxiv.org/abs/1602.02697) 13 [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks . EuroS&P’19 (https://arxiv.org/abs/1805.02628) 13 [4] Orekondy et al. - Knockoff Nets: Stealing Functionality of Black-Box Models. CVPR’19 (https://arxiv.org/abs/1812.02766)
Effectiveness of extraction attacks Test Accuracy % (performance recovery) Victim Model (Dataset-model) Victim model PRADA attack [3] Knockoff Nets attack [4] Surrogate model Surrogate model [5] MNIST (10 digits) 98.0 97.9 (0.99x) - GTSRB (43 road signs) 98.1 62.5 (0.64x) 94.8 (0.96x) Caltech (256 generic objects) 74.1 - 72.2 (0.97x) CUBS (200 bird species) 77.2 - 70.9 (0.91x) Diabetic (5 stages of retinopathy) 71.1 - 53.5 (0.75x) Complex models can be effectively stolen using ~100,000 queries • Using initial data structured and related to victim model task • Using training hyperparameters specific to victim model [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks . EuroS&P’19 (https://arxiv.org/abs/1805.02628) 14 [4] Orekondy et al. - Knockoff Nets: Stealing Functionality of Black-Box Models. CVPR’19 (https://arxiv.org/abs/1812.02766) 14 [5] Atli et al. - Extraction of Complex DNN Models: Real Threat or Boogeyman? AAAI-EDSMLS 2020 (https://arxiv.org/pdf/1910.05429.pdf )
Outline: recap Is intellectual property of ML model important? Yes Can models be extracted via their prediction APIs? Yes What can be done to counter model extraction? • Prevent model extraction • Detect model extraction • Deter model extraction (watermarking) 15
Counter model extraction attacks 1. Initial data collection 2. Select model hyperparameters 3. Query API for predictions → Only step where a defense can be applied 4. Train surrogate model Client Prediction Victim ML API model Model Surrogate Model 16
Prevent model extraction
Prevent model extraction Degrade surrogate model accuracy 3. Query API for predictions : car Top-1 4. Train surrogate model Top-n (car,0.85) + : (boat,0.1) probability (cat,0.05) (car,0.75) Client (boat,0.1) (cat,0.05) Prediction Full prediction Victim ML : (plane,0.04) API vector model Model (dog,0.01) (fish,0.01) … Surrogate Model 18
Modify victim model prediction Reduce prediction granularity [1] : car Top-1 • Full vector → top-1 / top-N • Model extraction still effective with top-1 [3] Top-n (car,0.85) + : (boat,0.1) probability (cat,0.05) Alter predictions • Modify probabilities in prediction vector [6] (car,0.75) • Model extraction still effective with top-1 [3] (boat,0.1) • Large rate of wrong predictions is effective [5] (cat,0.05) Full prediction : (plane,0.04) vector Degrade model utility for legitimate clients (dog,0.01) (fish,0.01) … [1] Tramèr et al. - Stealing Machine Learning Models via Prediction APIs . USENIX’16 (https://arxiv.org/abs/1609.02943) [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks . EuroS&P’19 (https://arxiv.org/abs/1805.02628) [5] Atli et al. - Extraction of Complex DNN Models: Real Threat or Boogeyman? AAAI-EDSMLS’20 (https://arxiv.org/pdf/1910.05429.pdf ) 19 [6] Lee et al. - Defending Against NN Model Stealing Attacks Using Deceptive Perturbations . S&PW’19 (https://arxiv.org/abs/1806.00054)
Detect model extraction PRADA: PRotecting Againts DNN Model Stealing Attacks [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks . EuroS&P’19 (https://arxiv.org/abs/1805.02628)
Extraction attacks using synthetic samples [1,2] Common characteristic Specific pattern in attacks: 1. Natural/random samples • Establish initial decision boundaries 2. Synthetic samples ~ similar to existing samples • Refine the boundaries Study distribution of queries to detect model extraction attacks [1] Tramèr et al. - Stealing Machine Learning Models via Prediction APIs . USENIX’16 (https://arxiv.org/abs/1609.02943) 21 [2] Papernot et al.- Practical Black-Box Attacks against Machine Learning . ASIA CCS ‘17 (https://arxiv.org/abs/1602.02697)
Intuition for a defense Preliminary: distance between random points in a space fits a normal (Gaussian) distribution Assumptions • Benign queries consistently distributed → distances fit a normal distribution • Adversarial queries focused on a few areas → distances deviate from a normal distribution Benign Attack Benign Attack MNIST (10 digits) GTSRB (43 road signs) 22
PRADA: PRotecting Againts DNN Model Stealing Attacks Stateful defense • Keeps track of queries submitted by a given client • Detects deviation from a normal distribution Shapiro-Wilk test • Quantify how well a set of samples D fits a normal distribution • Test statistic: W(D) < 𝜀 → attack detected 𝜀 : parameter to be defined • [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks . EuroS&P’19 (https://arxiv.org/abs/1805.02628) 23
Recommend
More recommend