confidence calibrated adversarial training
play

Confidence-Calibrated Adversarial Training Generalizing to Unseen - PowerPoint PPT Presentation

Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias Hein, Bernt Schiele 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L adversarial examples:


  1. Confidence-Calibrated Adversarial Training Generalizing to Unseen Attacks David Stutz, Matthias Hein, Bernt Schiele

  2. 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L ∞ adversarial examples: training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 Correct 0 . 6 Adversarial robust 0 . 4 ≤ ǫ (seen) 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  3. 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L ∞ adversarial examples: training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 Correct 0 . 6 Adversarial robust not robust 0 . 4 ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  4. 2-Minute Overview Problem: Robustness to various adversarial examples. Adversarial training on L ∞ adversarial examples: 1 Confidence SVHN: 0 . 8 not robust Correct 0 . 6 L 2 attack Adversarial 0 . 4 (unseen) 0 . 2 0 0 0 . 5 1 1 . 5 2 L 2 Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  5. 2-Minute Overview Summary of adversarial training: training ǫ = 0 . 03 1 1 Confidence Confidence 0 . 8 0 . 8 not robust 0 . 6 0 . 6 L 2 attack robust not robust 0 . 4 0 . 4 (unseen) ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 . 2 0 0 0 0 . 5 1 1 . 5 2 0 0.01 0.03 0.05 L ∞ Perturbation L 2 Perturbation in Adversarial Direction in Adversarial Direction ◮ High-confidence on adversarial examples ( ≤ ǫ ). ◮ No generalization to larger/other L p perturbations. ◮ Behavior not meaningful for arbitrarily large ǫ . Confidence-Calibrated Adversarial Training – David Stutz

  6. 2-Minute Overview Confidence-calibrated adversarial training ( L ∞ only ): training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen Correct 0 . 6 Adversarial 0 . 4 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  7. 2-Minute Overview Confidence-calibrated adversarial training ( L ∞ only ): training ǫ =0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen > ǫ unseen Correct 0 . 6 Adversarial confidence threshold 0 . 4 robust by rejecting 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  8. 2-Minute Overview Confidence-calibrated adversarial training ( L ∞ only ): 1 Confidence SVHN: 0 . 8 unseen L 2 attack Correct 0 . 6 confidence threshold Adversarial 0 . 4 robust by rejecting 0 . 2 0 0 0 . 5 1 1 . 5 2 L 2 Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  9. 2-Minute Overview Adversarial training: training ǫ = 0 . 03 1 ◮ High-confidence on adversarial examples. Confidence 0 . 8 ◮ No robustness to unseen perturbations. 0 . 6 robust not robust 0 . 4 ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation Confidence-calibrated adversarial training: training ǫ =0 . 03 1 ◮ Low-confidence on adversarial examples. Confidence 0 . 8 ≤ ǫ seen > ǫ unseen ◮ Robustness to unseen perturbations 0 . 6 confidence threshold 0 . 4 by confidence thresholding. robust by rejecting 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation Confidence-Calibrated Adversarial Training – David Stutz

  10. Interested? More details: Paper & code: davidstutz.de/ccat Contact: david.stutz@mpi-inf.mpg.de Confidence-Calibrated Adversarial Training – David Stutz

  11. Interested? More details: Paper & code: davidstutz.de/ccat Contact: david.stutz@mpi-inf.mpg.de Outline: 1. Problems of adversarial training 2. Confidence-calibrated adversarial training 3. Confidence-thresholded robust test error 4. Results on SVHN and CIFAR10 Confidence-Calibrated Adversarial Training – David Stutz

  12. Problems of Adversarial Training Min-max formulation: classifier � � min � δ � ∞ ≤ ǫ L ( f ( x + δ ; w ) , y ) max . w E p ( x,y ) minimizing cross-entropy yields high-confidence Confidence-Calibrated Adversarial Training – David Stutz

  13. Problems of Adversarial Training Min-max formulation: classifier � � min � δ � ∞ ≤ ǫ L ( f ( x + δ ; w ) , y ) max . w E p ( x,y ) minimizing cross-entropy yields high-confidence training ǫ = 0 . 03 1 1 Confidence Confidence 0 . 8 0 . 8 not robust 0 . 6 0 . 6 L 2 attack robust not robust 0 . 4 0 . 4 (unseen) ≤ ǫ (seen) > ǫ (unseen) 0 . 2 0 . 2 0 0 0 0 . 5 1 1 . 5 2 0 0.01 0.03 0.05 L 2 Perturbation L ∞ Perturbation in Adversarial Direction in Adversarial Direction ◮ Robustness does not generalize to unseen attacks. Confidence-Calibrated Adversarial Training – David Stutz

  14. Confidence-Calibrated Adversarial Training 1 Transition to uniform distribution on adversarial examples within the ǫ -ball: 1 training ǫ = 0 . 03 training ǫ = 0 . 03 Confidence 0 . 8 0 . 6 0 . 4 0 . 2 0 − 0 . 04 − 0 . 03 − 0 . 02 − 0 . 01 0 0 . 01 0 . 02 0 . 03 0 . 04 L ∞ Perturbation in (Adversarial) Direction ◮ Low-confidence extrapolated beyond ǫ -ball. Confidence-Calibrated Adversarial Training – David Stutz

  15. Confidence-Calibrated Adversarial Training 1 Transition to low confidence on adversarial examples within the ǫ -ball. 2 Reject low-confidence (adversarial) examples via confidence-thresholding: training ǫ =0 . 03 1 Confidence 0 . 6 CCAT 0 . 8 0 . 6 0 . 4 ← reject confidence threshold 0 . 4 0 . 2 reject 0 . 2 0 0 0 0 . 01 0 . 02 0 . 03 0 . 04 0 0 . 2 0 . 4 0 . 6 0 . 8 1 L ∞ Perturbation Confidence on Adversarial Examples Confidence-Calibrated Adversarial Training – David Stutz

  16. 1 Transition to Low Confidence 1. Compute high-confidence adversarial examples: ˜ δ = max � δ � ∞ ≤ ǫ max k � = y f k ( x + δ ; w ) confidence of class k 2. Impose target distribution via cross-entropy loss: y = λ one_hot ( y ) + (1 − λ ) 1 / K ˜ 1 Distribution ˜ y transition 0 . 8 Target 0 . 6 λ = (1 − min(1 , � δ � ∞ / ǫ )) ρ 0 . 4 completely uniform 0 . 2 0 0 0 . 01 0 . 02 0 . 03 L ∞ Perturbation ( � δ � ∞ ) Confidence-Calibrated Adversarial Training – David Stutz

  17. 1 Transition to Low Confidence 1. Compute high-confidence adversarial examples: ˜ δ = max � δ � ∞ ≤ ǫ max k � = y f k ( x + δ ; w ) confidence of class k 2. Impose target distribution via cross-entropy loss: y = λ one_hot ( y ) + (1 − λ ) 1 / K ˜ 1 Distribution ˜ y transition 0 . 8 Target 0 . 6 λ = (1 − min(1 , � δ � ∞ / ǫ )) ρ 0 . 4 completely uniform 0 . 2 0 0 0 . 01 0 . 02 0 . 03 L ∞ Perturbation ( � δ � ∞ ) Confidence-Calibrated Adversarial Training – David Stutz

  18. 2 Robustness by Confidence Thresholding training ǫ = 0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen Correct 0 . 6 Adversarial 0 . 4 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  19. 2 Robustness by Confidence Thresholding training ǫ =0 . 03 1 Confidence SVHN: 0 . 8 ≤ ǫ seen > ǫ unseen Correct 0 . 6 Adversarial confidence threshold 0 . 4 robust by rejecting 0 . 2 0 0 0.01 0.03 0.05 L ∞ Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  20. 2 Robustness by Confidence Thresholding 1 Confidence SVHN: 0 . 8 unseen L 2 attack Correct 0 . 6 Adversarial confidence threshold 0 . 4 robust by rejecting 0 . 2 0 0 0 . 5 1 1 . 5 2 L 2 Perturbation in Adversarial Direction Confidence-Calibrated Adversarial Training – David Stutz

  21. 2 Meaningful Extrapolation of Confidence Adversarial training: Confidence 1 = x ′ x = 0 . 8 0 . 6 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Confidence-calibrated adversarial training: Confidence 1 = x ′ x = 0 . 8 0 . 6 0 . 4 0 . 2 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Interpolation Factor κ Confidence-Calibrated Adversarial Training – David Stutz

  22. Summary: Generalizable Robustness Confidence-calibrated adversarial training: 1 Transition: low confidence on adversarial examples. 2 Reject low-confidence (adversarial) examples. training ǫ =0 . 03 1 1 Confidence Confidence 0 . 8 unseen L 2 attack 0 . 8 ≤ ǫ seen > ǫ unseen 0 . 6 0 . 6 confidence threshold confidence threshold 0 . 4 0 . 4 robust by rejecting robust by rejecting 0 . 2 0 . 2 0 0 0 0 . 5 1 1 . 5 2 0 0.01 0.03 0.05 L 2 Perturbation L ∞ Perturbation in Adversarial Direction in Adversarial Direction ◮ Robustness to previously unseen perturbations. Confidence-Calibrated Adversarial Training – David Stutz

  23. “Standard” Robust Test Error RErr = error on test examples that are “attacked”. Adversarial Training (AT): Ours (CCAT): 57.3% RErr 97.8% RErr Confidence-Calibrated Adversarial Training – David Stutz

  24. “Standard” Robust Test Error RErr = error on test examples that are “attacked”. Adversarial Training (AT): Ours (CCAT): 57.3% RErr 97.8% RErr 0 . 6 0 . 6 AT CCAT 0 . 4 0 . 4 Total: 539/1000 Total: 949/1000 0 . 2 0 . 2 0 0 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Confidence on Confidence on Adversarial Examples Adversarial Examples Confidence-Calibrated Adversarial Training – David Stutz

Recommend


More recommend