CloudLeak: DNN Model Extractions from Commercial MLaaSPlatforms Yun-Yun Tsai & Tsung-Yi Ho National Tsing Hua University #BHUSA @BLACKHATEVENTS
Who We Are Yun-Yun Tsai Research Assistant Department of Computer Science National Tsing Hua University Education: National Tsing Hua University, M.S in Computer Science. National Tsing Hua University, B.S in Computer Science. Research Interests: Adversarial Machine Learning Trustworthy AI 1.
Who We Are Tsung-Yi Ho, PhD Professor Department of Computer Science National Tsing Hua University Program Director AI Innovation Program Ministry of Science and Technology, Taiwan Research Interests: Hardware and Circuit Security Trustworthy AI Design Automation for Emerging Technologies 2.
A Preliminary paper published on NDSS 2020 • CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples (paper link) 3.
Outline • Background • MLaaS in Cloud • Overview of Adversarial Attack • Adversarial Examples based Model Stealing • Adversarial Active Learning • FeatureFool • MLaaS Model Stealing Attacks • Case Study & Experimental Results • Commercial APIs hosted by Microsoft, Face++, IBM, Google and Clarifai • Defenses • Conclusions 4.
Background & Motivation 29.
Success of DNN “ P erceptron” “ Multi-Layer P erceptron” “ Deep Convolutional Neural N etwork” DNN based systems are widely used in various applications: Revolution of DNN Structure 1E+10 160 152 7M 61M 60M 100000000 138M # Parameters 120 # Layers 1000000 80 10000 40 22 100 16 8 1 0 AlexNet VGG-16 GoogLeNet ResNet-152 6. Parameters Layers
Machine Learning on Cloud • Machine learning as a service ( MLaaS ) provided by cloud providers is gradually being accepted as a reliable solution to various applications. Prepare Experiment Deploy Prepare Code Deploy as a your data your model web service Train & Test 7.
Machine Learning as a Service Training API Prediction API Sensitive Data Black-box Inputs Outputs Dataset Goal 1: Rich Goal 2: Model Prediction API Confidentiality Suppliers User $$$ per query Overview of MLaaS Working Flow 8.
Property of MLaaS Model Confidence Services Products and Solutions Customization Function Black-box Monetize Types Scores √ √ √ √ Custom Vision Traffic Recognition NN Microsoft √ √ √ √ Custom Vision Flower Recognition NN Face Emotion × √ √ √ Face++ Emotion Recognition API NN Verification √ √ √ √ IBM Watson Visual Recognition Face Recognition NN √ √ √ √ Google AutoML Vision Flower Recognition NN Offensive Content × √ √ √ Clarifai Not Safe for Work (NSFW) NN Moderation 9.
Motivation: Security on MLaaS Training API Prediction API Sensitive Data Black-box Inputs Outputs Dataset Goal 1: Rich Goal 2: Model Prediction API Confidentiality Suppliers User $$$ per query Overview of MLaaS Working Flow 10.
Motivation: Security on MLaaS Training API Prediction API Sensitive Data Black-box Inputs Outputs Dataset Goal 1: Rich Goal 2: Model Prediction API Confidentiality Suppliers $$$ per query Overview of MLaaS Working Flow 11.
Adversarial Examples in DNN • Adversarial examples are model inputs generated by an adversary to fool deep learning models. “adversarial “predict label” “source example” “ adversarial example ” perturbation ” AI/ML ? + = system Chris Evans Tony Stark 12.
Adversarial Examples in DNN • Non-Feature-based Source Adversarial • Projected Gradient Descent (PGD) attack • C&W attack • Zeroth Order Optimization (ZOO) attack Source Adversarial • Feature-based • Feature Adversary attack (FA) • FeatureFool Carlini et al, 2017 Source Guide Adversarial Adversarial Perturbation Source Perturbation Guide 13. Sabour et al, 2016
Holistic View of Adversarial Attack AI/ML Inference system Inference Data Model Testing Phase Training Phase *No access to model internal information in the black- box setting 14.
Our Goal • We aim to accurately retrain an Prediction API Training equivalent local model of the target model by querying the Private Black box data labels and confidence scores of input model predictions. Supplier 16.
Adversarial Example based Model Stealing Attack 17.
Model Stealing Attacks • Various model stealing attacks have been developed • None of them can achieve a good tradeoffs among query counts, accuracy, cost, etc. Proposed Attacks Parameter Size Queries Accuracy Black-box? Stealing Cost √ F. Tramer (USENIX’16) ~ 45k ~ 102k High Low √ Juuti (EuroS&P’19) ~ 10M ~ 111k High - √ Correia- Silva (IJCNN’18) ~ 200M ~ 66k High High √ Papernot (AsiaCCS’17) ~ 100M ~ 7k Low - 18.
Adversarial Active Learning f x ( ) 0 Source example f x ( ) 0 19. A high-level illustration of the adversarial example generation
Adversarial Active Learning • We gather a set of “useful examples” to train a substitute model with the performance similar f x ( ) 0 as the black-box model. Source example Medium-confidence benign example Minimum-confidence benign example Minimum-confidence adversarial example Medium-confidence adversarial example Maximum-confidence adversarial example f x ( ) 0 20. A high-level illustration of the adversarial example generation
Adversarial Active Learning (cont.) • We gather a set of “useful examples” to train a substitute model with the performance similar f x ( ) 0 as the black-box model. Source example Medium-confidence benign example Medium-confidence adversarial example Maximum-confidence adversarial example Minimum-confidence benign example Minimum-confidence adversarial example f x “Useful examples” ( ) 0 Illustration of the margin-based uncertainty sampling strategy. 21.
FeatureFool: Margin-based Adv. Example • To reduce the scale of the perturbation, we further propose a feature- based attack to generate more robust adversarial examples. • Attack goal: Low confidence score for true class (we use 𝑁 to control the confidence score). ′ such that 𝑦 𝑡 ′ ∈ [0,1] 𝑜 ′ , 𝑦 𝑡 + 𝛽 ∙ 𝑚𝑝𝑡𝑡 𝑔,𝑚 𝑦 𝑡 minimize 𝑒 𝑦 𝑡 ′ , we formally define it as: For the triplet loss 𝑚𝑝𝑡𝑡 𝑔,𝑚 𝑦 𝑡 ′ = max(𝐸 ∅ 𝐿 𝑦 𝑡 ′ , ∅ 𝐿 𝑦 𝑢 ′ , ∅ 𝐿 𝑦 𝑡 𝑚𝑝𝑡𝑡 𝑔,𝑚 𝑦 𝑡 − 𝐸 ∅ 𝐿 𝑦 𝑡 + 𝑁, 0) • In order to solve the reformulated optimization problem above, we apply the box-constrained L-BFGS for finding a minimum of the loss function. 22.
Overview of FeatureFool Attack (a) Source image (b) Adversarial perturbation (d) Feature Extractor (e) Salient Features + 𝑎(𝑦 𝑡 + 𝜀) 𝑦 𝑡 𝜀 (c) Guide Image 1. 𝑎(𝑦 𝑢 ) (f) Box-constrained L-BFGS 𝑦 𝑢 (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); 23. (3) Search for the minimum perturbation that satisfies the optimization formula (f).
Overview of FeatureFool Attack (a) Source image (b) Adversarial perturbation (d) Feature Extractor (e) Salient Features + 𝑎(𝑦 𝑡 + 𝜀) 𝑦 𝑡 𝜀 (c) Guide Image 2. 𝑎(𝑦 𝑢 ) (f) Box-constrained L-BFGS 𝑦 𝑢 (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); 24. (3) Search for the minimum perturbation that satisfies the optimization formula (f).
Overview of FeatureFool Attack (a) Source image (b) Adversarial perturbation (d) Feature Extractor (e) Salient Features + 𝑎(𝑦 𝑡 + 𝜀) 𝑦 𝑡 𝜀 (c) Guide Image 𝑎(𝑦 𝑢 ) 3. (f) Box-constrained L-BFGS 𝑦 𝑢 (1) Input an image and extract the corresponding n-th layer feature mapping using the feature extractor (a)-(d); (2) Compute the class salience map to decide which points of feature mapping should be modified (e); 25. (3) Search for the minimum perturbation that satisfies the optimization formula (f).
FeatureFool: A New Adversarial Attack Source Guide Source Adversarial Guide Adversarial Adversarial Source Guide Limited Limited Stop: Speed: Speed: 0.99 √ 26. 0.98 √ 0.01 ×
MLaaS Model Stealing Attacks • Our attack approach • Use all adversarial examples to generate the malicious inputs; • Obtain input-output pairs by querying black-box APIs with malicious inputs; • Retrain the substitute models which are generally chosen from candidate Model Zoo. Candidate Library MLaaS Adversary Model Zoo Search (AlexNet, VGGNet, Inputs ResNet) Malicious Examples Outputs (PGD, C&W, FeatureFool) 27. Illustration of the proposed MLaaS model stealing attacks
MLaaS Model Stealing Attacks • Overview of the transfer framework for the model theft attack (a) Unlabeled Synthetic Datatset (c) Synthetic (d) Feature Transfer (b) MLaaS (e) Prediction Genuine Domain Dataset with Query Stolen Labels Label ? DB Malicious Domain Retrained Layers Boundary Reused Layers Layer copied from Teacher Layer trained by Student (Adversary) 28.
Recommend
More recommend