Deep Adversarial Learning for NLP 9:00 10:30 Introduction and - PowerPoint PPT Presentation

Results: Adversarial Learning Improves Response Generation vs a vanilla generation model Adversarial Adversarial Tie Win Lose 62% 18% 20% Human Evaluator 52

Sample response Tell me ... how long have you had this falling sickness ? System Response 53

Sample response Tell me ... how long have you had this falling sickness ? System Response Vanilla-Seq2Seq I don’t know what you are talking about. 54

Sample response Tell me ... how long have you had this falling sickness ? System Response Vanilla-Seq2Seq I don’t know what you are talking about. Mutual Information I’m not a doctor. 55

Sample response Tell me ... how long have you had this falling sickness ? System Response Vanilla-Seq2Seq I don’t know what you are talking about. Mutual Information I’m not a doctor. Adversarial Learning A few months, I guess. 56

Self-Supervised Learning meets Adversarial Learning • Self-Supervised Dialog Learning (Wu et al., ACL 2019) • Use of SSL to learn dialogue structure (sequence ordering). 57

Self-Supervised Learning meets Adversarial Learning • Self-Supervised Dialog Learning (Wu et al., ACL 2019) • Use of SSN to learn dialogue structure (sequence ordering). • REGS: Li et al., (2017) AEL: Xu et al., (2017) 58

Conclusion • Deep adversarial learning is a new, diverse, and inter- disciplinary research area, and it is highly related to many subareas in NLP. • GANs have obtained particular strong results in Vision, but yet there are both challenges and opportunities in GANs for NLP. • In a case study, we show that adversarial learning for dialogue has obtained promising results. • There are plenty of opportunities ahead of us with the current advances of representation learning, reinforcement learning, and self-supervised learning techniques in NLP. 59

UCSB Postdoctoral Scientist Opportunities • Please talk to me at NAACL, or email william@cs.ucsb.edu. 60

Thank you! • Now we will take an 30 mins break. 61

Slides: http://tiny.cc/adversarial Adversarial Examples in NLP Sameer Singh sameer@uci.edu @sameer_ sameersingh.org

What are Adversarial Examples? “panda” “gibbon” 57.7% confidence 99.3% confidence [Goodfellow et al, ICLR 2015 ] Sameer Singh, NAACL 2019 Tutorial 2

What’s going on? Fast Gradient Sign Method [Goodfellow et al, ICLR 2015 ] Sameer Singh, NAACL 2019 Tutorial 3

Applications of Adversarial Attacks • Security of ML Models • Should I deploy or not? What’s the worst that can happen? • Evaluation of ML Models • Held-out test error is not enough • Finding Bugs in ML Models • What kinds of “adversaries” might happen naturally? • (Even without any bad actors) • Interpretability of ML Models? • What does the model care about, and what does it ignore? Sameer Singh, NAACL 2019 Tutorial 4

Challenges in NLP Change L 2 is not really defined for text What is imperceivable? What is a small vs big change? What is the right way to measure this? Search Text is discrete, cannot use continuous optimization How do we search over sequences? Effect Classification tasks fit in well, but … What about structured prediction? e.g. sequence labeling Language generation? e.g. MT or summarization Sameer Singh, NAACL 2019 Tutorial 5

Choices in Crafting Adversaries Different ways to address the challenges Sameer Singh, NAACL 2019 Tutorial 6

Choices in Crafting Adversaries What is a small change? How do we find the attack? What does it mean to misbehave? Sameer Singh, NAACL 2019 Tutorial 7

Choices in Crafting Adversaries What is a small change? Sameer Singh, NAACL 2019 Tutorial 8

Change: What is a small change? Characters Words Phrase/Sentence Pros: Pros: Pros: • • • Often easy to miss Always from vocabulary Most natural/human-like • • • Easier to search over Often easy to miss Test long-distance effects Cons: Cons: Cons: • • • Gibberish, nonsensical words Ungrammatical changes Difficult to guarantee quality • • • No useful for interpretability Meaning also changes Larger space to search Main Challenge: Defining the distance between x and x’ Sameer Singh, NAACL 2019 Tutorial 9

Change: A Character (or few) x = [ “I love movies” ] x = [ ‘I’ ‘ ’ ‘l’ ‘o’ ‘v’ … x' = [ ‘I’ ‘ ’ ‘l’ ‘ i ’ ‘v’ … Edit Distance: Flip, Insert, Delete [ Ebrahimi et al, ACL 2018, COLING 2018 ] Sameer Singh, NAACL 2019 Tutorial 10

Change: Word-level Changes x = [ ‘I ’ ‘like’ ‘this’ ‘movie’ ‘ .’ ] Let’s replace this word Random word? x' = [ ‘I ’ ‘lamp’ ‘this’ ‘movie’ ‘ .’ ] Word Embedding? x' = [ ‘I ’ ‘really’ ‘this’ ‘movie’ ‘ .’ ] Part of Speech? x' = [ ‘I ’ ‘eat’ ‘this’ ‘movie’ ‘ .’ ] Language Model? x' = [ ‘I ’ ‘hate’ ‘this’ ‘movie’ ‘ .’ ] [Jia and Liang, EMNLP 2017 ] [ Alzantot et. al. EMNLP 2018 ] Sameer Singh, NAACL 2019 Tutorial 11

Change: Paraphrasing via Backtranslation x, x’ should mean the same thing ( semantically-equivalent adversaries) Translate into multiple languages Use back-translators to score candidates x Este é um bom filme S(x, x’) ∝ This is a good movie 0.5 * P(x’ | Este é um bom filme) + c ’est un bon film 0.5 * P(x’ | c’est un bon film) S( , ) = 1 This is a good movie This is a good movie S( , ) = 0.95 That is a good movie This is a good movie S( , ) = 0 Dogs like cats This is a good movie [Ribeiro et al ACL 2018] Sameer Singh, NAACL 2019 Tutorial 12

Change: Sentence Embeddings Encoder E x f y z x' f y' z' D Decoder (GAN) • Deep representations are supposed to encode meaning in vectors • If (x- x’) is difficult to compute, maybe we can do (z - z’)? [Zhao et al ICLR 2018] Sameer Singh, NAACL 2019 Tutorial 13

Choices in Crafting Adversaries What is a small change? Sameer Singh, NAACL 2019 Tutorial 14

Choices in Crafting Adversaries How do we find the attack? Sameer Singh, NAACL 2019 Tutorial 15

Search: How do we find the attack? Even this is often unrealistic Access probabilities Only access predictions Full access to the model (usually unlimited queries) (compute gradients) Create x’ and test whether Create x’ and test whether Use the gradient to craft x’ the model misbehaves general direction is correct Sameer Singh, NAACL 2019 Tutorial 16

Search: Gradient-based 𝐾 𝑦 Or whatever the misbehavior is 1. Compute the gradient 2. Step in that direction (continuous) 3. Find the nearest neighbor 4. Repeat if necessary 𝛼𝐾 𝑦 Beam search over the above… [ Ebrahimi et al, ACL 2018, COLING 2018 ] Sameer Singh, NAACL 2019 Tutorial 17

Search: Sampling 1. Generate local perturbations 2. Select ones that looks good 3. Repeat step 1 with these new ones 4. Optional: beam search, genetic algo [Jia and Liang, EMNLP 2017 ] [Zhao et al, ICLR 2018 ] [ Alzantot et. al. EMNLP 2018 ] Sameer Singh, NAACL 2019 Tutorial 18

Search: Enumeration (Trial/Error) 1. Make some perturbations 2. See if they work 3. Optional: pick the best one [Iyyer et al, NAACL 2018 ] [Ribeiro et al, ACL 2018 ] [Belinkov, Bisk, ICLR 2018 ] Sameer Singh, NAACL 2019 Tutorial 19

Choices in Crafting Adversaries How do we find the attack? Sameer Singh, NAACL 2019 Tutorial 20

Choices in Crafting Adversaries What does it mean to misbehave? Sameer Singh, NAACL 2019 Tutorial 21

Effect: What does it mean to misbehave? Classification Untargeted: any other class Targeted: specific other class Other Tasks Loss-based: Maximize the loss on the example e.g. perplexity/log-loss of the prediction MT: Don't attack me! ¡No me ataques! Property-based: Test whether a property holds NER: e.g. MT: A certain word is not generated NER: No PERSON appears in the output Sameer Singh, NAACL 2019 Tutorial 22

Evaluation: Are the attacks “good”? • Are they Effective? • Attack/Success rate • Are the Changes Perceivable? (Human Evaluation) • Would it have the same label? • Does it look natural? • Does it mean the same thing? • Do they help improve the model? • Accuracy after data augmentation • Look at some examples! Sameer Singh, NAACL 2019 Tutorial 23

Review of the Choices • Effect • Targeted or Untargeted • Choose based on the task • Search • Gradient-based • Change • Sampling • Character level • Enumeration • Word level • Phrase/Sentence level • Evaluation Sameer Singh, NAACL 2019 Tutorial 24

Research Highlights In terms of the choices that were made Sameer Singh, NAACL 2019 Tutorial 25

Noise Breaks Machine Translation! Change Search Tasks Random Character Based Passive; add and test Machine Translation [Belinkov, Bisk, ICLR 2018 ] Sameer Singh, NAACL 2019 Tutorial 26

Change Search Tasks Hotflip Character-based Gradient-based; beam-search Machine Translation, (extension to words) Classification, Sentiment News Classification Machine Translation [ Ebrahimi et al, ACL 2018, COLING 2018 ] Sameer Singh, NAACL 2019 Tutorial 27

Search Using Genetic Algorithms Change Search Tasks Black-box, population-based Word-based, Genetic Algorithm Textual Entailment, search of natural adversary language model score Sentiment Analysis [ Alzantot et. al. EMNLP 2018 ] Sameer Singh, NAACL 2019 Tutorial 28

Change Search Tasks Natural Adversaries Sentence, Stochastic search Images, Entailment, GAN embedding Machine Translation Textual Entailment [Zhao et al, ICLR 2018 ] Sameer Singh, NAACL 2019 Tutorial 29

Change Search Tasks Sentence via Enumeration VQA, SQuAD, Semantic Adversaries Backtranslation Sentiment Analysis Semantically-Equivalent Adversary Semantically-Equivalent Adversarial Rules (SEA) (SEARs) Backtranslation Patterns x’ (x, x’) Rules x + Enumeration in “diffs” color → colour [Ribeiro et al, ACL 2018 ] Sameer Singh, NAACL 2019 Tutorial 30

Transformation Rules : VisualQA [Ribeiro et al, ACL 2018 ] Sameer Singh, NAACL 2019 Tutorial 31

Transformation Rules : SQuAD [Ribeiro et al, ACL 2018 ] Sameer Singh, NAACL 2019 Tutorial 32

Transformation Rules : Sentiment Analysis [Ribeiro et al, ACL 2018 ] Sameer Singh, NAACL 2019 Tutorial 33

Change Search Tasks Adding a Sentence Add a Sentence Domain knowledge, Question Answering stochastic search [Jia, Liang, EMNLP 2017 ] Sameer Singh, NAACL 2019 Tutorial 34

Some Loosely Related Work Use a broader notions of adversaries Sameer Singh, NAACL 2019 Tutorial 35

CRIAGE: Adversaries for Graph Embeddings Which link should we add/remove, out of million possible links? [ Pezeshkpour et. al. NAACL 2019 ] Sameer Singh, NAACL 2019 Tutorial 36

“Should Not Change” / “Should Change” How do dialogue systems behave when the inputs are perturbed in specific ways? Should Not Change Should Change • like Adversarial Attacks • Overstability Test • Random Swap • Add Negation • Stopword Dropout • Antonyms • Paraphrasing • Randomize Inputs • Grammatical Mistakes • Change Entities [Niu, Bansal, CONLL 2018 ] Sameer Singh, NAACL 2019 Tutorial 37

Overstability: Anchors Anchor Identify the conditions under which the classifier has the same prediction [Ribeiro et al, AAAI 2018 ] Sameer Singh, NAACL 2019 Tutorial 38

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and - PowerPoint PPT Presentation

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs William Wang 10:30 11:00 Break - 11:00 12:15 Adversarial Examples Sameer Singh 12:15 12:30 Conclusions and Question Answering William

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

What's next for adversarial ML? (and why ad-blockers should care) Florian Tramr EPFL July 9 th

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Adversarial camera stickers: A physical camera-based attack on deep learning systems Juncheng B.

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

A short overview on Reducing model bias in a deep learning classifier using domain adversarial

Adversarial Training for Deep Learning : A Framework for Improving Robustness, Generalization and

Adversarial Examples presentation by Ian Goodfellow Deep Learning Summer School Montreal August

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Maximum Entropy Inverse RL, Adversarial imitation learning Katerina Fragkiadaki Reinforcement

Maximum Entropy Inverse RL, Adversarial imitation learning Katerina Fragkiadaki Reinforcement

Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

AMMI Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran cois Fleuret

ATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on

NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial With contributions from Jiwei

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Re-Work Deep

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li,

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and - PowerPoint PPT Presentation

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs William Wang 10:30 11:00 Break - 11:00 12:15 Adversarial Examples Sameer Singh 12:15 12:30 Conclusions and Question Answering William

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

CS7015 (Deep Learning) : Lecture 23 Generative Adversarial Networks (GANs) Mitesh M. Khapra

What's next for adversarial ML? (and why ad-blockers should care) Florian Tramr EPFL July 9 th

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Adversarial camera stickers: A physical camera-based attack on deep learning systems Juncheng B.

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

A short overview on Reducing model bias in a deep learning classifier using domain adversarial

Adversarial Training for Deep Learning : A Framework for Improving Robustness, Generalization and

Adversarial Examples presentation by Ian Goodfellow Deep Learning Summer School Montreal August

Adversarial Examples and Adversarial Training Innova&amp;ve Technology Leader program January 22

CloudLeak: Large-Scale Deep Learning Models Stealing Through Adversarial Examples Honggang Yu 1 ,

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Maximum Entropy Inverse RL, Adversarial imitation learning Katerina Fragkiadaki Reinforcement

Maximum Entropy Inverse RL, Adversarial imitation learning Katerina Fragkiadaki Reinforcement

Adversarial Attacks and Defenses in Deep Learning Hang Su suhangss@tsinghua.edu.cn Institute for

Neglected topics CS 446 Adversarial examples and deep networks 1 / 23 Adversarial

AMMI Introduction to Deep Learning 10.1. Generative Adversarial Networks Fran cois Fleuret

ATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on

NLP William Wang Sameer Singh Slides: http://tiny.cc/adversarial With contributions from Jiwei

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Re-Work Deep

by learning the Distributions of Adversarial Examples Boqing Gong Joint work with Yandong Li,

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22