attacks meet interpretability attribute steered detection
play

Attacks Meet Interpretability: Attribute-steered Detection of - PowerPoint PPT Presentation

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( 50 times)


  1. Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang

  2. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack A.J. Buckley � 2

  3. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  4. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  5. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  6. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  7. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2

  8. Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley • Idea: is the classification result of a model mainly based on human perceptible attributes? � 2

  9. Architecture of AmI � 3

  10. Architecture of AmI Input � 3

  11. Architecture of AmI 1 Landmark Input generation � 3

  12. Architecture of AmI ✓ Left eye ✓ Right eye 1 2 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Input generation annotation � 3

  13. Architecture of AmI ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3

  14. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3

  15. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

  16. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 5 Consistency ✓ Nose ⊖ observer ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

  17. Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 5 Consistency ✓ Nose ⊖ observer ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3

  18. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? � 4

  19. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning � 4

  20. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ � 4

  21. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ � 4

  22. Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ Backward: no attribute changes —> no neuron activation changes ‣ � 4

  23. Attribute Witness Extraction � 5

  24. Attribute Witness Extraction Input � 5

  25. Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C Input � 5

  26. Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C D B Model Input Attribute preservation Feature invariants ⊖ Model � 5

  27. Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C E D Attribute witnesses B Model Input Attribute preservation Feature invariants ⊖ Model � 5

  28. Experimental Results � 6

  29. Experimental Results • Attribute witnesses � 6

  30. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer � 6

  31. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection � 6

  32. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs � 6

  33. Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs A state-of-the-art technique Feature Squeezing (NDSS '18) can only achieve 55% ‣ accuracy with 23.3% false positives for face recognition systems � 6

  34. Thank you! Please visit our poster #99 05:00-07:00 PM @ Room 210 & 230 AB

Recommend


More recommend