on evaluation of adversarial perturbations for sequence
play

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence - PowerPoint PPT Presentation

On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Paul Michel, Xian Li, Graham Neubig, Juan Pino Adversarial Attacks/Perturbations Apply a small (indistinguishable) perturbation to the input that elicit large


  1. On Evaluation of Adversarial Perturbations for Sequence-to-Sequence Models Paul Michel, Xian Li, Graham Neubig, Juan Pino

  2. Adversarial Attacks/Perturbations ● Apply a small (indistinguishable) perturbation to the input that elicit large changes in the output

  3. Adversarial Attacks/Perturbations ● Apply a small (indistinguishable) perturbation to the input that elicit large changes in the output Figure from Goodfellow et al. (2014)

  4. Adversarial Attacks/Perturbations ● Apply a small (indistinguishable) perturbation to the input that elicit large changes in the output Figure from Goodfellow et al. (2014)

  5. Adversarial Attacks/Perturbations ● Apply a small (indistinguishable) perturbation to the input that elicit large changes in the output Figure from Goodfellow et al. (2014)

  6. Adversarial Attacks/Perturbations ● Apply a small (indistinguishable) perturbation to the input that elicit large changes in the output Figure from Goodfellow et al. (2014)

  7. Indistinguishable Perturbations ● Small perturbations are well defined in vision ○ Small l2 ~= indistinguishable to the human eye ... l2 distance

  8. Indistinguishable Perturbations ● Small perturbations are well defined in vision ○ Small l2 ~= indistinguishable to the human eye ... l2 distance ● What about text ?

  9. Not all Text Perturbations are Equal He’s very friendly

  10. Not all Text Perturbations are Equal He’s very friendly He’s pretty friendly [Similar meaning] ✔

  11. Not all Text Perturbations are Equal He’s very friendly He’s very annoying He’s pretty friendly [Difgerent meaning] [Similar meaning] ❌ ✔

  12. Not all Text Perturbations are Equal He’s very friendly He’s very annoying He’s pretty friendly He’s She friendly [Difgerent meaning] [Similar meaning] [Nonsensical] ❌ ✔ ❌

  13. Not all Text Perturbations are Equal He’s very friendly He’s very annoying He’s pretty friendly He’s She friendly He’s very freindly [Difgerent meaning] [Similar meaning] [Nonsensical] [Typo] ❌ ✔ ❌ ✔

  14. Not all Text Perturbations are Equal He’s very friendly He’s very annoying He’s pretty friendly He’s She friendly He’s very freindly [Difgerent meaning] [Similar meaning] [Nonsensical] [Typo] ❌ ✔ ❌ ✔ ⇒ Can’t expect the model to output the same output!

  15. Not all Text Perturbations are Equal He’s very friendly He’s very annoying He’s pretty friendly He’s She friendly He’s very freindly [Difgerent meaning] [Similar meaning] [Nonsensical] [Typo] ❌ ✔ ❌ ✔ ⇒ Can’t expect the model to output the same output! This paper: Why and How you should evaluate adversarial perturbations

  16. A Framework for Evaluating Adversarial Attacks

  17. Problem Definition Reference They plow it right back into filing more troll lawsuits. Original Ils le réinvestissent directement en engageant plus de procès.

  18. Problem Definition Reference They plow it right back into filing more troll lawsuits. Original Ils le réinvestissent directement en engageant plus de procès.

  19. Problem Definition Reference They plow it right back into filing more troll lawsuits. Base output They direct it directly by engaging Original Ils le réinvestissent directement en engageant more cases. plus de procès.

  20. Problem Definition Reference They plow it right back into filing more troll lawsuits. Evaluate Base output They direct it directly by engaging Original Ils le réinvestissent directement en engageant more cases. plus de procès.

  21. Problem Definition Reference They plow it right back into filing more troll lawsuits. Evaluate Base output They direct it directly by engaging Original Ils le réinvestissent directement en engageant more cases. plus de procès. Attack Ilss le réinvestissent dierctement en engagaent Adv. src plus de procès.

  22. Problem Definition Reference They plow it right back into filing more troll lawsuits. Evaluate Base output They direct it directly by engaging Original Ils le réinvestissent directement en engageant more cases. plus de procès. Attack Adv. output .. de plus. Ilss le réinvestissent dierctement en engagaent Adv. src plus de procès.

  23. Problem Definition Reference They plow it right back into filing more troll lawsuits. Evaluate Base output They direct it directly by engaging Original Ils le réinvestissent directement en engageant more cases. plus de procès. Attack Evaluate too! Adv. output .. de plus. Ilss le réinvestissent dierctement en engagaent Adv. src plus de procès.

  24. Source Side Evaluation ● Evaluate meaning preservation on the source side ● Where is a similarity metric such that > He’s very friendly He’s pretty friendly He’s very friendly He’s very annoying > He’s very friendly He’s pretty friendly He’s very friendly He’s She friendly [...]

  25. Target Side Evaluation ● Given , a similarity metric on the target side

  26. Target Side Evaluation ● Given , a similarity metric on the target side ● Evaluate relative meaning destruction on the target side

  27. Target Side Evaluation ● Given , a similarity metric on the target side ● Evaluate relative meaning destruction on the target side

  28. Target Side Evaluation ● Given , a similarity metric on the target side ● Evaluate relative meaning destruction on the target side

  29. Target Side Evaluation ● Given , a similarity metric on the target side ● Evaluate relative meaning destruction on the target side

  30. Target Side Evaluation ● Given , a similarity metric on the target side ● Evaluate relative meaning destruction on the target side

  31. Target Side Evaluation ● Given , a similarity metric on the target side ● Evaluate relative meaning destruction on the target side

  32. Successful Adversarial Attacks ● Ensure that:

  33. Successful Adversarial Attacks ● Ensure that: Source meaning destruction

  34. Successful Adversarial Attacks ● Ensure that: Source meaning destruction Target meaning destruction

  35. Successful Adversarial Attacks ● Ensure that: Source meaning destruction Target meaning destruction ● Destroy the meaning on the target side more than on the source side

  36. Which similarity metric to use? “How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences ● Human evaluation is meaningless 1. The topic is the same but the meaning is difgerent ○ 6 point scale, details in paper 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are well-formed [Language]

  37. Which similarity metric to use? “How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences ● Human evaluation is meaningless 1. The topic is the same but the meaning is difgerent ○ 6 point scale, details in paper 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are ● BLEU [Papineni et al., 2002] well-formed [Language] ○ Geometric mean of n-gram precision + length penalty

  38. Which similarity metric to use? “How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences ● Human evaluation is meaningless 1. The topic is the same but the meaning is difgerent ○ 6 point scale, details in paper 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are ● BLEU [Papineni et al., 2002] well-formed [Language] ○ Geometric mean of n-gram precision + length penalty ● METEOR [Banerjee and Lavie, 2005] ○ Word matching taking into account stemming, synonyms, paraphrases...

  39. Which similarity metric to use? “How would you rate the similarity between the meaning of these two sentences?” 0. The meaning is completely difgerent or one of the sentences ● Human evaluation is meaningless 1. The topic is the same but the meaning is difgerent ○ 6 point scale, details in paper 2. Some key information is difgerent 3. The key information is the same but the details difger 4. Meaning is essentially the same but some expressions are unnatural 5. Meaning is essentially equal and the two sentences are ● BLEU [Papineni et al., 2002] well-formed [Language] ○ Geometric mean of n-gram precision + length penalty ● METEOR [Banerjee and Lavie, 2005] ○ Word matching taking into account stemming, synonyms, paraphrases... ● chrF [Popović, 2015] ○ Character n-gram F-score

  40. Experimental Setting

  41. Data and Models ● Data ○ IWSLT 2016 dataset ○ {Czech, German, French} → English ● Models ○ LSTM based model ○ Transformer based model ○ Both word and sub-word based models

Recommend


More recommend