adversarial attacks and defenses in deep learning
play

Adversarial Attacks and Defenses in Deep Learning Hang Su - PowerPoint PPT Presentation

Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for Artificial Intelligence Dept. of Computer Science & Technology Tsinghua University 1 Background Artificial intelligence (AI) is a


  1. Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for Artificial Intelligence Dept. of Computer Science & Technology Tsinghua University 1

  2. Background ⚫ Artificial intelligence (AI) is a transformative technology that holds promise for tremendous societal and economic benefit, which has made dramatic success in a torrent of applications ⚫ AI has the potential to revolutionize how we live, work, learn, discover, and communicate. 2

  3. AI is NOT Trustworthy ⚫ AI — The Revolution Hasn’t Happened Yet. ---Michael Jordan ⚫ The effectiveness of AI algorithms will be limited by the machine’s inability to explain its decisions and actions to human users . ⚫ Several machine learning models, including neural networks, consistently misclassify adversarial examples Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00% 3

  4. Content ⚫ Understandable: traceability, explainability and communication ⚫ Adversarial Robust: resilience to attack and security (Adversarial) Robust Trustworthy AI Understandable 4

  5. Robustness ⚫ A crucial component of achieving Trustworthy AI is technical robustness ⚫ Technical robustness requires that AI systems be developed with a preventative approach to risks Alps: 94.39% Dog: 99.99% Puffer: 97.99% Crab: 100.00% • YP Dong et al., Boosting Adversarial Attacks with Momentum, In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018. • Fangzhou Liao et al. Defense Against Adversarial Attacks Using High-Level Representation Guided Denoiser, In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, 2018. 5

  6. The world can be adversarial ⚫ We need to demystify the black-box models, and develop more transparent and interpretable models to make them more trustworthy and robust ➢ DNNs can be easily duped by adversarial examples crafted by adding small, human-imperceptible noises ➢ It may pose severe risks for numerous applications Adversarial Attack on Social Network [Dai et al, ICML2018] [Sharif Bhagavatula Bauer Reiter 2016] 6

  7. Is ML inherently not reliable ? ⚫ No: But we need to re-think how we do ML ⚫ Adversarial aspects = stress-testing our solutions ⚫ Towards Adversarial Robust Models “ pig ” “ pig ” (91%) “ airliner ” (99%) = + 0.005 x 7

  8. A Limitation of the ML Framework ⚫ Measure of performance: Fraction of mistakes during testing ⚫ But: In reality, the distributions we use ML on are NOT the ones we train it on = F Training Inference Training Inference 8

  9. Adversary-aware Machine Learning ⚫ Machine learning systems should be aware of the arms race with the adversary ➢ Know your adversary ➢ Be proactive ➢ Protect your classifier System Designer System Designer Model adversary Develop countermeasure Simulate attack Evaluate attack ’ s impact 9

  10. Adversarial Attack Scenarios ⚫ White box attack (WBA): Access to any information about the target classifier, including the prediction, gradient information, etc. ⚫ Practical black box attack (PBA): Only the prediction of the target classifier is available. When the prediction confidence is accessible: PBA-C; if only the discrete label is allowed: PBA-D. ⚫ Restricted black box attack (RBA): Black-box queries allowed only on some samples, and the attacker must create adversarial perturbations to other samples. WBA > PBA-C > PBA-D > RBA 10

  11. White-box attacks ² + 1, malicious • ( ) = − f ( x ) = sign g ( x ) # - 1, legitimate • % • − − x − • – x ' • in – 11

  12. Black-box attacks (transferability) ⚫ Cross-model transferability (Liu et al., 2017) ⚫ Cross-data transferability (Moosavi-Dezfooli et al., 2017) 12

  13. Limitations of black-box attacks ⚫ The trade-off between transferability and attack ability, makes black-box attacks less effective. 100 Inc-v3 vs. I-FGSM Inc-v4 vs. I-FGSM IncRes-v2 vs. I-FGSM Res-152 vs. I-FGSM 80 Success Rate (%) 60 • Attack Inception V3; • Evaluate the success rates of attacks on Inception V3, 40 Inception V4, Inception ResNet V2, ResNet v2-152; • ϵ = 16 ; 20 • 1000 images from ImageNet. 0 1 2 3 4 5 6 7 8 9 10 Number of Iterations 13

  14. Momentum iterative FGSM [CVPR18] Dong et al, Boosting Adversarial Attack via Momentum, CVPR 2018 * wining solution at NIPS 2017 competition

  15. Experimental Results ⚫ 𝜗 = 16, 𝜈 = 1.0, 10 10 iterations ➢ MI-FGSM can attack a white-box model with near 100% success rates ➢ It fools an black-box model with much higher success rates 15

  16. Query-based Black-box Attacks ⚫ Transfer-based ❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets); ⚫ Score-based ❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications; ⚫ Decision-based ❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries

  17. Score-based Attacks ⚫ Query loss function 𝑔(𝑦) given 𝑦 ⚫ Goal: Maximize 𝑔(𝑦) until attack succeeds ⚫ Estimate the gradient of 𝛼𝑔(𝑦) by queries, and apply the first-order optimization methods. 1 𝑔 𝑦+𝜏𝑣 𝑗 ,𝑧 −𝑔 𝑦,𝑧 𝑟 𝑟 σ 𝑗=1 𝑕 = ො 𝑕 𝑗 , ො where ො 𝑕 𝑗 = ⋅ 𝑣 𝑗 𝜏 ➢ In ordinary RGF method, 𝑣 𝑗 is sampled uniformly from the 𝐸 -dimensional Euclidean hypersphere. 17

  18. Gradient estimation framework ⚫ Our loss function: 2 𝑀 ො 𝑕 = min 𝑐≥0 E 𝛼𝑔 𝑦 − 𝑐 ො 𝑕 2 ⚫ Minimized mean square error w.r.t. the scale coefficient 𝑐 ➢ Usually the normalized gradient is used, hence the norm does not matter 18

  19. Prior-guided RGF (P-RGF) method ⚫ Use the normalized ( 𝑤 2 = 1 ) transfer gradient of a surrogated model 2 𝛼𝑔 𝑦 ⊤ 𝐃𝛼𝑔 𝑦 2 − ⊤ ] . 𝐃 = E[𝑣 𝑗 𝑣 𝑗 𝜏→0 𝑀 ො lim 𝑕 = 𝛼𝑔 𝑦 𝑟 𝛼𝑔 𝑦 ⊤ 𝐃𝛼𝑔 𝑦 , 2 1− 1 𝑟 𝛼𝑔 𝑦 ⊤ 𝐃 2 𝛼𝑔 𝑦 + 1 ⚫ The gradient estimator can be implemented as 1 − 𝜇 ⋅ 𝐉 − 𝑤𝑤 ⊤ 𝜊 𝑗 𝑣 𝑗 = 𝜇 ⋅ 𝑤 + ⚫ Incorporate the data prior to accelerate the gradient estimation 19

  20. Performance of gradient estimation ⚫ Cosine similarity (averaged over all images) between the gradient estimate and the true gradient w.r.t. attack iterations: ⚫ The transfer gradient is more useful at the beginning and less useful later ➢ Showing the advantage of using adaptive 𝜇 ∗ 20

  21. Results on defensive models ⚫ ASR: Attack Success Rate (#queries is under 10,000); AVG. Q: Average #queries over successful attacks. ⚫ Methods with the subscript “D” refers to the data-dependent version of the P-RGF method. 21

  22. Query-based Black-box Attacks ⚫ Transfer-based ❖ Generate adversarial examples against white-box models, and leverage transferability for attacks; ❖ Require no knowledge of the target model, no queries; ❖ Need white-box models (datasets); ⚫ Score-based ❖ The target model provides output probability distribution; ❖ Black-box optimization by gradient estimation methods; ❖ Impractical in some real-world applications; ⚫ Decision-based ❖ The target model only provides hard-label predictions; ❖ Practical in real-world applications; ❖ Need a large number of queries

  23. Query-based Adversarial Attack ⚫ We search for an adversarial example by modeling the local geometry of the search directions and reduce the dimension of the search space. Original False Images 1,000 True Queries 10,000 True Queries 100,000 True Queries A “ Blac k-box” Model 23

  24. Objective Function ⚫ Constrained optimization problem 𝐸 𝑦 ∗ , 𝑦 , 𝑡. 𝑢. 𝐷 𝑔 𝑦 ∗ argmin = 1 , 𝑦 ∗ ❖ 𝐸(⋅,⋅) is a distance metric; 𝐷(⋅) is an adversarial criterion ( 𝐷 𝑔 𝑦 = 0 ). ⚫ A reformulation 𝑀 𝑦 ∗ = 𝐸 𝑦 ∗ , 𝑦 + 𝜀 𝐷 𝑔 𝑦 ∗ argmin = 1 𝑦 ∗ Non-adversarial Region Implement a black- box gradient estimation using a local-search based on query

  25. Evolutionary Attack ⚫ (1+1) covariance matrix adaptation evolution strategy 𝑦 ∗ ∈ 𝑆 𝑜 (already adversarial) ❖ Initialize ෤ ❖ For t = 1, 2, … , T do ➢ Sample 𝑨~N 0, σ 2 C 𝑦 ∗ + 𝑨 < 𝑀(෤ 𝑦 ∗ ) : ➢ If 𝑀 ෤ 𝑦 ∗ = ෤ 𝑦 ∗ + 𝑨 ➢ ෤ ➢ Update( σ , C) ❖ Return ෤ 𝑦 ∗ ⚫ Model the local geometry of the search directions ⚫ Reduce the dimension of the search space

  26. Covariance Matrix Adaptation ⚫ The storage and computation complexity of full covariance matrix is at least 𝑃 𝑜 2 ; ⚫ We use a diagonal covariance matrix; ⚫ Update rule: 𝑑 𝑑 (2 − 𝑑 𝑑 ) 𝑨 𝑞 𝑑 = 1 − 𝑑 𝑑 𝑞 𝑑 + 𝜏 2 𝑑 𝑗𝑗 = 1 − 𝑑 𝑑𝑝𝑤 𝑑 𝑗𝑗 + 𝑑 𝑑𝑝𝑤 𝑞 𝑑 𝑗

Recommend


More recommend