No Metrics Are Perfect: Adversarial REward Learning for Visual - PowerPoint PPT Presentation

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang*, Wenhu Chen*, Yuan-Fang Wang, William Wang 1

Image Captioning Caption: Two young kids with backpacks sitting on the porch. 2

Visual Storytelling Story: Story: Story: The brother did not want to talk to his sister. The siblings made up. The brother did not want to talk to his sister. The siblings made up. The brother did not want to talk to his sister. The siblings made up. They started to talk and smile. Their parents showed up. They were They started to talk and smile. Their parents showed up. They were They started to talk and smile. Their parents showed up. They were happy to see them. happy to see them. happy to see them. Imagination Emotion Subjectiveness 3

Visual Storytelling Story #2: The brother and sister were ready for the first day of school. They were excited to go to their first day and meet new friends. They told their mom how happy they were. They said they were going to make a lot of new friends. Then they got up and got ready to get in the car. 4

Behavioral cloning methods ( e.g. MLE) are not good enough for visual storytelling 5

Reinforcement Learning o Directly optimize the existing metrics BLEU, METEOR, ROUGE, CIDEr • Reduce exposure bias • Environment Reward Reinforcement Optimal Function Learning (RL) Policy Rennie 2017, “Self-critical Sequence Training for Image Captioning” 6

We had a great time to have a lot of the. They were to be a of the. They were to be in the. The and it were to be the. The, and it were to be the. Average METEOR score: 40.2 (SOTA model: 35.0) 7

I had a great time at the restaurant today. The food was delicious. I had a lot of food. I had a great time. BLEU-4 score: 0 8

No Metrics Are Perfect! 9

erse Reinforcement Learning In Inver Reinforcement Learning Environment Environment Reward Reward Inverse Reinforcement Reinforcement Optimal Optimal Function Function Learning (IRL) Learning (RL) Policy Policy 10

Adversarial REward Learning (AREL) Environment Reference Story Generated Adversarial Reward Model Policy Model Story Objective Inverse RL Reward RL 11

Policy Model 𝜌 " My brother recently graduated college. It was a formal cap and gown event. CNN My mom and dad attended. Later, my aunt and grandma showed up. When the event was over he even got congratulated by the mascot. Encoder Decoder 12

Reward Model 𝑆 $ CNN + my mom and Reward dad attended . <EOS> Story FC layer Pooling Convolution Kim 2014, “Convolutional Neural Networks for Sentence Classification” 13

Associating Reward with Story Energy-based models associate an energy value 𝐹 $ (𝑦) with a sample 𝑦 , modeling the data as a Boltzmann distribution 𝑞 $ 𝑦 = exp (−𝐹 $ (𝑦)) 𝑎 Reward Boltzmann Distribution Reward Function Story Partition function Approximate data distribution ∗ (𝑋) is achieved when Optimal reward function 𝑆 $ LeCun et al. 2006, “A tutorial on energy-based learning”

AREL Objective Therefore, we define an adversarial objective with KL-divergence Reward Boltzmann distribution Empirical distribution Policy distribution • The objective of Reward Model 𝑆 $ : • The objective of Policy Model 𝜌 " : 15

Reward Visualization 16

Automatic Evaluation Method Method Method Method Method BLEU-1 BLEU-1 BLEU-1 BLEU-1 BLEU-1 BLEU-2 BLEU-2 BLEU-2 BLEU-2 BLEU-2 BLEU-3 BLEU-3 BLEU-3 BLEU-3 BLEU-3 BLEU-4 BLEU-4 BLEU-4 BLEU-4 BLEU-4 METEOR METEOR METEOR METEOR METEOR ROUGE ROUGE ROUGE ROUGE ROUGE CIDEr CIDEr CIDEr CIDEr CIDEr Seq2seq (Huang et al.) Seq2seq (Huang et al.) Seq2seq (Huang et al.) Seq2seq (Huang et al.) Seq2seq (Huang et al.) - - - - - - - - - - - - - - - - - - - - 31.4 31.4 31.4 31.4 31.4 - - - - - - - - - - HierAttRNN (Yu et al.) HierAttRNN (Yu et al.) HierAttRNN (Yu et al.) HierAttRNN (Yu et al.) HierAttRNN (Yu et al.) - - - - - - - - - - 21.0 21.0 21.0 21.0 21.0 - - - - - 34.1 34.1 34.1 34.1 34.1 29.5 29.5 29.5 29.5 29.5 7.5 7.5 7.5 7.5 7.5 AREL (ours) XE XE XE XE 63.7 62.3 62.3 62.3 62.3 38.2 39.0 38.2 38.2 38.2 23.1 22.5 22.5 22.5 22.5 14.0 13.7 13.7 13.7 13.7 35.0 34.8 34.8 34.8 34.8 29.7 29.7 29.6 29.7 29.7 9.5 8.7 8.7 8.7 8.7 BLEU-RL BLEU-RL AREL (ours) BLEU-RL 62.1 63.7 62.1 62.1 38.0 39.0 38.0 38.0 23.1 22.6 22.6 22.6 13.9 13.9 14.0 13.9 34.6 34.6 35.0 34.6 29.6 29.0 29.0 29.0 8.9 9.5 8.9 8.9 METEOR-RL METEOR-RL METEOR-RL 68.1 68.1 68.1 35.0 35.0 35.0 15.4 15.4 15.4 6.8 6.8 6.8 40.2 40.2 40.2 30.0 30.0 30.0 1.2 1.2 1.2 ROUGE-RL ROUGE-RL ROUGE-RL 58.1 58.1 58.1 18.5 18.5 18.5 1.6 1.6 1.6 0.0 0.0 0.0 27.0 27.0 27.0 33.8 33.8 33.8 0.0 0.0 0.0 CIDEr-RL CIDEr-RL CIDEr-RL 61.9 61.9 61.9 37.8 37.8 37.8 22.5 22.5 22.5 13.8 13.8 13.8 34.9 34.9 34.9 29.7 29.7 29.7 8.1 8.1 8.1 GAN AREL (ours) AREL (ours) 62.8 63.7 63.7 38.8 39.0 39.0 23.0 23.1 23.1 14.0 14.0 14.0 35.0 35.0 35.0 29.5 29.6 29.6 9.5 9.0 9.5 AREL (ours) 63.7 39.0 23.1 14.0 35.0 29.6 9.5 Huang et al. 2016, “Visual Storytelling” Yu et al. 2017, “Hierarchically-Attentive RNN for Album Summarization and Storytelling” 17

Human Evaluation Turing Test 50% -6.3 40% -13.7 -17.5 -26.1 30% 20% 10% 0% XE BLEU-RL CIDEr-RL GAN AREL Win Unsure 18

Human Evaluation Pairwise Comparison Relevance : the story accurately describes what is happening in the photo stream and covers the main objects. Expressiveness : coherence, grammatically and semantically correct, no repetition, expressive language style. Concreteness : the story should narrate concretely what is in the images rather than giving very general descriptions. 19

There were many There were many There were many We took a trip to We took a trip to We took a trip to We had a great We had a great We had a great He was a great He was a great He was a great It was a beautiful It was a beautiful It was a beautiful different kinds of different kinds of different kinds of XE-ss XE-ss XE-ss the mountains. the mountains. the mountains. time. time. time. time. time. time. day. day. day. different kinds. different kinds. different kinds. At the end of the At the end of the There were so There were so The family decided The family decided day, we were able day, we were able many different many different The family decided The family decided to take a trip to to take a trip to AREL AREL I had a great time. I had a great time. to take a picture of to take a picture of to go on a hike. to go on a hike. kinds of things to kinds of things to the countryside. the countryside. the beautiful the beautiful see. see. scenery. scenery. Human- There were a lot of We drank a lot of We went on a hike The view was created strange plants I had a great time. water while we yesterday. spectacular. Story there. were hiking. 20

Takeaway o Generating and evaluating stories are both challenging due to the complicated nature of stories o No existing metrics are perfect for either training or testing o AREL is a better learning framework for visual storytelling Can be applied to other generation tasks § o Our approach is model-agnostic Advanced models à better performance § 21

Thanks! Paper: https://arxiv.org/abs/1804.09160 Code: https://github.com/littlekobe/AREL 22

No Metrics Are Perfect: Adversarial REward Learning for Visual - PowerPoint PPT Presentation

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang, Wenhu Chen, Yuan-Fang Wang, William Wang 1 Image Captioning Caption: Two young kids with backpacks sitting on the porch. 2 Visual Storytelling

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin Wang*, Wenhu

Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu Chen, Yuan-Fang Wang, William

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Outline Games Perfect play: principles of adversarial search minimax decisions

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

Outline Games Perfect play minimax decisions Adversarial Search pruning

Reward Shaping in Episodic Reinforcement Learning Marek Grze s Canterbury, UK AAMAS 2017

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

Where's The Reward? Where's The Reward? A Review of Reinforcement Learning for Instructional

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Risk/Reward Risk/Reward If you buy here, what is the target? What is the risk? 1 221

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Learning Universal Adversarial Perturbations with Generative Models Jamie Hayes & George

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

Using Natural Language for Reward Shaping in Reinforcement Learning Prasoon Goyal , Scott Niekum

Controllable Invariance through Adversarial Feature Learning Qizhe Xie, Zihang Dai, Yulun Du,

The ULTIMATE Business Incentive Company REWARD YOUR CUSTOMERS; REWARD YOUR EMPLOYEES REWARD YOUR

Learning frameworks Associative reinforcement learning Given input, learn to produce output

Vulnerability of machine learning models to adversarial examples Petra Vidnerov Institute of

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

No Metrics Are Perfect: Adversarial REward Learning for Visual - PowerPoint PPT Presentation

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang*, Wenhu Chen*, Yuan-Fang Wang, William Wang 1 Image Captioning Caption: Two young kids with backpacks sitting on the porch. 2 Visual Storytelling

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin Wang*, Wenhu

Adversarial Reward Learning for Visual Storytelling Xin Wang, Wenhu Chen, Yuan-Fang Wang, William

SECURITY, ADVERSARIAL SECURITY, ADVERSARIAL LEARNING, AND PRIVACY LEARNING, AND PRIVACY

Outline Games Perfect play: principles of adversarial search minimax decisions

Adversarial Learning Bounds for Linear Classes and Neural Nets Understanding Adversarial Learning

Deep Adversarial Learning for NLP 9:00 10:30 Introduction and Adversarial Training, GANs

Adversarial Approaches to Bayesian Learning and Bayesian Approaches to Adversarial Robustness

Outline Games Perfect play minimax decisions Adversarial Search pruning

Reward Shaping in Episodic Reinforcement Learning Marek Grze s Canterbury, UK AAMAS 2017

Friendly Adversarial Training: Attacks Which Do Not Kill Training Make Adversarial Learning

Where's The Reward? Where's The Reward? A Review of Reinforcement Learning for Instructional

AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning Florian Tramr November

CSC321 Lecture 22: Adversarial Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Adversarial

Risk/Reward Risk/Reward If you buy here, what is the target? What is the risk? 1 221

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Learning Universal Adversarial Perturbations with Generative Models Jamie Hayes &amp; George

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

Using Natural Language for Reward Shaping in Reinforcement Learning Prasoon Goyal , Scott Niekum

Controllable Invariance through Adversarial Feature Learning Qizhe Xie, Zihang Dai, Yulun Du,

The ULTIMATE Business Incentive Company REWARD YOUR CUSTOMERS; REWARD YOUR EMPLOYEES REWARD YOUR

Learning frameworks Associative reinforcement learning Given input, learn to produce output

Vulnerability of machine learning models to adversarial examples Petra Vidnerov Institute of

Adversarial Examples and Adversarial Training Innova&amp;ve Technology Leader program January 22

Adversarial Domain Adaptation and Adversarial Robustness Judy Hoffman + = Big Deep success

No Metrics Are Perfect: Adversarial REward Learning for Visual Storytelling Xin (Eric) Wang, Wenhu Chen, Yuan-Fang Wang, William Wang 1 Image Captioning Caption: Two young kids with backpacks sitting on the porch. 2 Visual Storytelling

Learning Universal Adversarial Perturbations with Generative Models Jamie Hayes & George

Adversarial Examples and Adversarial Training Innova&ve Technology Leader program January 22