Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples Guanhong Tao , Shiqing Ma, Yingqi Liu, Xiangyu Zhang
Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack A.J. Buckley � 2
Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2
Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2
Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2
Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2
Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley � 2
Understanding Adversarial Samples Legitimate input Isla Fisher Model Pixel-wise Differences ( × 50 times) C&W 2 attack Human A.J. Buckley • Idea: is the classification result of a model mainly based on human perceptible attributes? � 2
Architecture of AmI � 3
Architecture of AmI Input � 3
Architecture of AmI 1 Landmark Input generation � 3
Architecture of AmI ✓ Left eye ✓ Right eye 1 2 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Input generation annotation � 3
Architecture of AmI ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3
Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction � 3
Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 ✓ Nose ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3
Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 5 Consistency ✓ Nose ⊖ observer ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3
Architecture of AmI Attribute-steered model 4 ✓ Left eye ✓ Right eye 1 2 3 5 Consistency ✓ Nose ⊖ observer ✓ Mouth ✓ … Landmark Attribute Attribute witness Input generation annotation extraction Original model � 3
Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? � 4
Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning � 4
Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ � 4
Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ � 4
Challenges • Are there correspondences between attributes and neurons? • If yes, how to extract corresponding neurons? • Propose: Bi-directional reasoning Forward: attribute changes —> neuron activation changes ‣ Backward: neuron activation changes —> attribute changes ‣ Backward: no attribute changes —> no neuron activation changes ‣ � 4
Attribute Witness Extraction � 5
Attribute Witness Extraction Input � 5
Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C Input � 5
Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C D B Model Input Attribute preservation Feature invariants ⊖ Model � 5
Attribute Witness Extraction ⊖ Model Feature variants Attribute substitution A Model C E D Attribute witnesses B Model Input Attribute preservation Feature invariants ⊖ Model � 5
Experimental Results � 6
Experimental Results • Attribute witnesses � 6
Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer � 6
Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection � 6
Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs � 6
Experimental Results • Attribute witnesses The number of witnesses extracted is smaller than 20 , although there are 64-4096 ‣ neurons in each layer • Adversary detection Achieve 94% detection accuracy for 7 different kinds of attacks with 9.91% false ‣ positives on benign inputs A state-of-the-art technique Feature Squeezing (NDSS '18) can only achieve 55% ‣ accuracy with 23.3% false positives for face recognition systems � 6
Thank you! Please visit our poster #99 05:00-07:00 PM @ Room 210 & 230 AB
Recommend
More recommend