Protecting the intellectual property of machine learning models: - - PowerPoint PPT Presentation

protecting the intellectual property of machine learning
SMART_READER_LITE
LIVE PREVIEW

Protecting the intellectual property of machine learning models: - - PowerPoint PPT Presentation

Protecting the intellectual property of machine learning models: Prediction APIs, model extraction attacks and watermarking Samuel Marchal Aalto University & F-Secure Corporation samuel.marchal@f-secure.fi (joint work with N. Asokan, Buse


slide-1
SLIDE 1

Protecting the intellectual property

  • f machine learning models:

Prediction APIs, model extraction attacks and watermarking

Samuel Marchal Aalto University & F-Secure Corporation samuel.marchal@f-secure.fi (joint work with N. Asokan, Buse Atli, Mika Juuti , Sebastian Szyller)

slide-2
SLIDE 2

2

Supervised machine learning

Machine learning model

  • a function: f(x) = y
  • x: pictures → y: word label

Large number of possible inputs x Finite number of classes y

  • Cat vs dog
  • Car / truck / plane / boat

= car

f( )

slide-3
SLIDE 3

3

: car

  • 1. Model training: Learn function f()

𝑈𝑠𝑏𝑗𝑜𝑓𝑠 ML model 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 Analyst

: boat : car

function f()

=

: car : boat : boat

slide-4
SLIDE 4

4

  • 2. Prediction: Use function f()

ML model Client

Car → prediction

function f()

=

slide-5
SLIDE 5

Protecting the intellectual property

  • f machine learning models:

Prediction APIs, model extraction attacks and watermarking

Samuel Marchal Aalto University & F-Secure Corporation samuel.marchal@f-secure.fi (joint work with Buse Atli, Sebastian Szyller, Mika Juuti, N. Asokan)

slide-6
SLIDE 6

6

Outline

Is intellectual property of ML models important? Can intellectual property of ML models be compromised?

  • Model extraction attacks

What can be done to counter model extraction?

  • Prevent model extraction
  • Detect model extraction
  • Deter model extraction (watermarking)
slide-7
SLIDE 7

7

Is intellectual property of ML model important?

Machine learning models: business advantage to model owner Cost of

  • gathering relevant data
  • labeling data
  • expertise required to choose the right model + training method (e.g., hyperparamaters)
  • resources expended in training

: car

𝑈𝑠𝑏𝑗𝑜𝑓𝑠 ML model 𝐸𝑏𝑢𝑏𝑡𝑓𝑢 Analyst

: boat : car : car : boat : boat

slide-8
SLIDE 8

8

How to prevent model theft?

White box model theft can be countered by making model confidential

  • Computation with encrypted models
  • Protecting models using secure hardware
  • Hosting models behind a firewalled cloud service

Basic idea: hide the model itself, expose model functionality only via a prediction API Is that enough to prevent model theft?

8

slide-9
SLIDE 9

Model extraction attacks

slide-10
SLIDE 10

10

Extracting models via their prediction APIs

Prediction APIs are oracles that leak information Adversary

  • Malicious client
  • Goal: rebuild a surrogate model for a victim model
  • Capability: access to prediction API or model outputs

Extraction attack success

  • Surrogate has similar performance as victim model

ML model Prediction API Client Victim Model

Surrogate Model

Client

slide-11
SLIDE 11

11

Generic model extraction attack

1. Initial data collection 2. Select model hyperparameters 3. Query API for predictions 4. Train surrogate model

  • - unlabeled seed training samples
  • - model, hyperparameters, layers, ...
  • - get labels / classification probabilities
  • - update model

ML model Prediction API Client Victim Model

Surrogate Model

slide-12
SLIDE 12

12

  • 1. Initial data collection (unlabeled)

Synthetic (semi-random) data only[1]

  • Steal logistic regression models, decision trees

Limited natural data + synthetic data[2,3]

  • Steal simple CNN models

Unlimited natural data[4]

  • Steal complex (deep) CNN models

12

[1] Tramèr et al. - Stealing Machine Learning Models via Prediction APIs. USENIX’16 (https://arxiv.org/abs/1609.02943) [2] Papernot et al.- Practical Black-Box Attacks against Machine Learning. ASIA CCS ‘17 (https://arxiv.org/abs/1602.02697) [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks. EuroS&P’19 (https://arxiv.org/abs/1805.02628) [4] Orekondy et al. - Knockoff Nets: Stealing Functionality of Black-Box Models. CVPR’19 (https://arxiv.org/abs/1812.02766)

slide-13
SLIDE 13

13

  • 2. Select model hyperparameters

Use pre-defined model hyperparameters

  • Fixed irrespective of victim model[4]
  • Same as victim model[2]

Optimize training hyperparameters[3]

  • According to initial data and victim model to steal
  • Cross-validation search using initial data labelled by victim model

Hyperparameters optimized for model extraction attacks increase attack effectiveness → surrogate model accuracy

13

[2] Papernot et al. -Practical Black-Box Attacks against Machine Learning. ASIA CCS ‘17 (https://arxiv.org/abs/1602.02697) [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks. EuroS&P’19 (https://arxiv.org/abs/1805.02628) [4] Orekondy et al. - Knockoff Nets: Stealing Functionality of Black-Box Models. CVPR’19 (https://arxiv.org/abs/1812.02766)

slide-14
SLIDE 14

14

Effectiveness of extraction attacks

14

Victim Model (Dataset-model) Test Accuracy % (performance recovery) Victim model PRADA attack[3] Surrogate model Knockoff Nets attack[4] Surrogate model[5] MNIST (10 digits) 98.0 97.9 (0.99x)

  • GTSRB (43 road signs)

98.1 62.5 (0.64x) 94.8 (0.96x) Caltech (256 generic objects) 74.1

  • 72.2 (0.97x)

CUBS (200 bird species) 77.2

  • 70.9 (0.91x)

Diabetic (5 stages of retinopathy) 71.1

  • 53.5 (0.75x)

[3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks. EuroS&P’19 (https://arxiv.org/abs/1805.02628) [4] Orekondy et al. - Knockoff Nets: Stealing Functionality of Black-Box Models. CVPR’19 (https://arxiv.org/abs/1812.02766) [5] Atli et al. - Extraction of Complex DNN Models: Real Threat or Boogeyman? AAAI-EDSMLS 2020 (https://arxiv.org/pdf/1910.05429.pdf)

Complex models can be effectively stolen using ~100,000 queries

  • Using initial data structured and related to victim model task
  • Using training hyperparameters specific to victim model
slide-15
SLIDE 15

15

Outline: recap

Is intellectual property of ML model important? Yes Can models be extracted via their prediction APIs? Yes What can be done to counter model extraction?

  • Prevent model extraction
  • Detect model extraction
  • Deter model extraction (watermarking)
slide-16
SLIDE 16

16

Counter model extraction attacks

1. Initial data collection 2. Select model hyperparameters 3. Query API for predictions 4. Train surrogate model → Only step where a defense can be applied

ML model Prediction API Client Victim Model

Surrogate Model

slide-17
SLIDE 17

Prevent model extraction

slide-18
SLIDE 18

18

Prevent model extraction

Degrade surrogate model accuracy

3. Query API for predictions 4. Train surrogate model

ML model Prediction API Client Victim Model

Surrogate Model

: car (car,0.85) : (boat,0.1) (cat,0.05) (car,0.75) (boat,0.1) (cat,0.05) : (plane,0.04) (dog,0.01) (fish,0.01) … Top-1 Top-n + probability Full prediction vector

slide-19
SLIDE 19

19

Modify victim model prediction

Reduce prediction granularity[1]

  • Full vector → top-1 / top-N
  • Model extraction still effective with top-1[3]

Alter predictions

  • Modify probabilities in prediction vector[6]
  • Model extraction still effective with top-1[3]
  • Large rate of wrong predictions is effective[5]

Degrade model utility for legitimate clients : car (car,0.85) : (boat,0.1) (cat,0.05) (car,0.75) (boat,0.1) (cat,0.05) : (plane,0.04) (dog,0.01) (fish,0.01) … Top-1 Top-n + probability Full prediction vector

[1] Tramèr et al. - Stealing Machine Learning Models via Prediction APIs. USENIX’16 (https://arxiv.org/abs/1609.02943) [3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks. EuroS&P’19 (https://arxiv.org/abs/1805.02628) [5] Atli et al. - Extraction of Complex DNN Models: Real Threat or Boogeyman? AAAI-EDSMLS’20 (https://arxiv.org/pdf/1910.05429.pdf) [6] Lee et al. - Defending Against NN Model Stealing Attacks Using Deceptive Perturbations. S&PW’19 (https://arxiv.org/abs/1806.00054)

slide-20
SLIDE 20

Detect model extraction

PRADA: PRotecting Againts DNN Model Stealing Attacks

[3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks. EuroS&P’19 (https://arxiv.org/abs/1805.02628)

slide-21
SLIDE 21

21

Extraction attacks using synthetic samples[1,2]

Common characteristic

Specific pattern in attacks:

  • 1. Natural/random samples
  • Establish initial decision boundaries
  • 2. Synthetic samples ~ similar to existing samples
  • Refine the boundaries

Study distribution of queries to detect model extraction attacks

[1] Tramèr et al. - Stealing Machine Learning Models via Prediction APIs. USENIX’16 (https://arxiv.org/abs/1609.02943) [2] Papernot et al.- Practical Black-Box Attacks against Machine Learning. ASIA CCS ‘17 (https://arxiv.org/abs/1602.02697)

slide-22
SLIDE 22

22

Intuition for a defense

Preliminary: distance between random points in a space fits a normal (Gaussian) distribution Assumptions

  • Benign queries consistently distributed → distances fit a normal distribution
  • Adversarial queries focused on a few areas → distances deviate from a normal distribution

MNIST (10 digits) GTSRB (43 road signs) Benign Attack Benign Attack

slide-23
SLIDE 23

23

PRADA: PRotecting Againts DNN Model Stealing Attacks

Stateful defense

  • Keeps track of queries submitted by a given client
  • Detects deviation from a normal distribution

Shapiro-Wilk test

  • Quantify how well a set of samples D fits a normal distribution
  • Test statistic: W(D) < 𝜀 → attack detected
  • 𝜀: parameter to be defined

[3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks. EuroS&P’19 (https://arxiv.org/abs/1805.02628)

slide-24
SLIDE 24

24

PRADA detection efficiency

  • All model extraction attacks (based on synthetic queries) detected
  • Detection triggered when synthetic samples queried

[1] Tramèr et al. - Stealing Machine Learning Models via Prediction APIs. USENIX’16 (https://arxiv.org/abs/1609.02943) [2] Papernot et al.- Practical Black-Box Attacks against Machine Learning. ASIA CCS ‘17 (https://arxiv.org/abs/1602.02697)

slide-25
SLIDE 25

25

Counter model stealing attacks

Modify predictions to degrade surrogate model accuracy

  • Only slows down attack but ineffective at preventing attack
  • Degrade utility for legitimate clients

Detect abnormal query distribution (e.g. PRADA)

  • Effective against attacks using synthetic queries
  • Can be circumvented using additional queries
  • Ineffective against attacks using natural queries (e.g., Knockoff attack)

25

Prevention and detection of model stealing attacks are ineffective Can we deter extraction attacks?

slide-26
SLIDE 26

Deter model extraction

DAWN: Dynamic Adversarial Watermarking of Neural Networks

[7] Szyller et. al. - DAWN: Dynamic Adversarial Watermarking of Neural Networks. Under submission. (https://arxiv.org/abs/1906.00830)

slide-27
SLIDE 27

27

Deter model extraction

We accept that models may get stolen… … and have a means to catch the thief: demonstrate ownership of a surrogate model

27

Prediction API Victim Model

Surrogate Model

Victim (owner) Prediction API

slide-28
SLIDE 28

28

Watermarking DNN models

Goal: Demonstrate ownership of model only by making queries to it Existing solution: Blackbox watermarking[8,9]

  • Exploit overcapacity and overfitting of DNN models
  • Select set of inputs (trigger) for which the model will provide selected incorrect predictions
  • Train the model with normal data + watermark (trigger set with incorrect labels)

28

Prediction API

Surrogate Model

Victim (owner)

: panda : car : tree

[8] Yadi et al. - Watermarking Deep Neural Networks by Backdooring. USENIX ‘18 (https://www.usenix.org/node/217594) [9] LeMerrer et al. - Adversarial Frontier Stitching for Remote Neural Network Watermarking. NeuralComputAppl‘19 (https://arxiv.org/abs/1711.01894)

slide-29
SLIDE 29

29

Demonstrate ownership of DNN model

Query the purported surrogate model with trigger set If enough predictions match our incorrect labels → Victim is the owner of the model Any original model (=not stolen) should return

  • Correct predictions most of the time = model accuracy
  • Selected incorrect prediction rarely ~ 1/n (n = number of classes)

: panda : car : car car = car panda = panda car ≠ tree

slide-30
SLIDE 30

30

Watermarking and model extraction attack

Existing watermarking techniques

  • Protect only against physical theft of the model
  • Model extraction attacks steal the model without the watermark

How to watermark surrogate models? Adversary

  • Selects inputs to query to prediction API
  • Controls training of the model
  • … But has no expectation on predictions provided by API

30

slide-31
SLIDE 31

31

Dynamic Adversarial Watermarking of DNNs

DAWN approach

  • Return incorrect predictions for a few queries (e.g, 0.1%)
  • Queries + incorrect predictions are used to train surrogate model
  • Queries + incorrect predictions = watermark embedded in surrogate model

Ø Because of DNN overcapacity & overfitting

31

Prediction API Victim Model

Surrogate Model

DAWN

  • 1. Query

sample x

  • 2. Decide if x

must be watermarked 3.a NO = Correct prediction 3.b YES = Incorrect prediction

slide-32
SLIDE 32

32

Watermark decision

Queries to watermark determined by watermark function WV(x)

  • Compute hash of input x (HMAC)
  • Return incorrect prediction if hash lower than threshold
  • rw determines the ratio of queries having incorrect prediction

Watermarked inputs properties

  • Unpredictable → conditioned by keyed hash (Kw secret)
  • Indistinguishable → hash is deterministic and input-dependent
  • Client-specific → different queries result in different watermarks

32

slide-33
SLIDE 33

33

Improve indistinguishability

Issue with watermark function WV(x) (based on hash)

  • WV(x) ≠ WV(x + 𝜁) for 𝜁 small
  • Adversary can discover watermarked inputs by querying small variations of the same input

Solution: define a mapping function

  • MV(x) = MV(x + 𝜁) for 𝜁 small
  • Use as input to WV( MV(x) ) instead of x

Mapping function MV(x): ℝ n → ℝ p where p < n

  • Autoencoder
  • Binning and masking function
  • Embedding from victim model

33

slide-34
SLIDE 34

34

Watermarked predictions (incorrect)

Predictions determined by backdoor function BV(x)

  • Keyed permutations of real predictions from victim model

Fv(x) = (car:0.67, boat:0.14, panda:0.01, tree:0.05, cat:0.13) Bv(x) = π(Kπ, Fv(x)) = (car:0.05, boat:0.01, panda:0.13, tree:0.67, cat:0.14) where Kπ = HMAC(Kw,x)

Watermarked predictions properties

  • Unpredictable → permutations determined by Kπ (secret)
  • Indistinguishable → keyed permutations are deterministic (Kπ derived from hash of input)
  • Believable → same probability distribution as real predictions

34

slide-35
SLIDE 35

35

DAWN approach summary

Watermark function WV(x)

  • Determines if x should be watermarked
  • A rate of rw predictions is watermarked with incorrect predictions

Backdoor function BV(x)

  • Determines what is the (backdoored watermark) response to x

Model owner

  • Record every input x that has been watermarked…
  • …Along with its backdoor prediction BV(x)
  • The set of all (x, BV(x)) is the watermark for a given API client

35

slide-36
SLIDE 36

36

DAWN effectiveness

Assessed against 2 model extraction attacks and 6 victim models

  • Acctest : accuracy on test data → Adversary aims to maximize
  • Accwm : accuracy on watermark → Owner aims to maximize

→ Demonstration of ownership successful if Accwm > 50%

36

[3] Juuti et al. - PRADA: Protecting against DNN Model Stealing Attacks. EuroS&P’19 (https://arxiv.org/abs/1805.02628) [4] Orekondy et al. - Knockoff Nets: Stealing Functionality of Black-Box Models. CVPR’19 (https://arxiv.org/abs/1812.02766) [3] [4]

slide-37
SLIDE 37

37

Preserving model utility

37

Protecting all tested attacks and models

  • Requires to return incorrect predictions for max 100 queries out of 10,000s
  • Decreases negligibly model accuracy by 0.03-0.5% (rw)
slide-38
SLIDE 38

38

Takeaways

Is intellectual property of ML model important? Yes models constitute business advantage to model owners Can models be extracted via their prediction APIs? Yes Protecting model data via cryptography or hardware security is insufficient What can be done to counter model extraction? Watermarking as a deterrence Watermarking at the prediction API is feasible Deserves to be considered as a deterrence against model stealing

More on our security + ML research at https://ssg.aalto.fi/research/projects/mlsec/