Another Diversity-Promoting Objective Function for Neural Dialogue - PowerPoint PPT Presentation

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Another Diversity-Promoting Objective Function for Neural Dialogue Generation Ryo Nakamura , Katsuhito Sudoh, Koichiro Yoshino, Satoshi Nakamura Graduate School of Information Science, Nara Institute of Science and Techonology, Japan RIKEN, Center for Advanced Intelligence Project AIP, Japan {nakamura.ryo.nm8, sudoh, koichiro, s-nakamura}@is.naist.jp � 1

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) ? up . much Nothing What’s much Nothing <Go> Neural Dialogue Generation consistency are not good by word using a trained neural network (e.g., seq2seq) AAAI 2019 DEEP-DIAL workshop tends to be a generic response like "I don’t know.” Why? • An open-domain dialogue system that generates a response word • Generation-base is more flexible than retrieval-base, but fluency, • In particular, the generated response has low diversity and � 2

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop rare words words During training data distribution one-hot target distribution Stupid is as stupid does . poor training signal wrong model distribution rich training signal • Frequent words in training set supply more training penalties than • Therefore, large occurrence probabilities is assigned to frequent � 3

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) During evaluation candidates for generation which contents vary depending on the context MAP prediction after MLE training enormous candidates in real dialogue data transduction Do you have any plans tonight? Not much. Hi, Sean. How are you doing? Hey, Erin. What’s up? How are you? Hey, what’s up? AAAI 2019 DEEP-DIAL workshop • Dialogue generation is a many-to-many transduction task in • Frequent words are applicable in any context, so they tend to be • As a result, only the most likely response is generated � 4

https://arxiv.org/abs/1811.08100 Measures already suggested response only During evaluation During training Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) No suggestion. We challenged it! Break down low diversity problem are handled equally regardless of lack and imbalance AAAI 2019 DEEP-DIAL workshop Information (MMI) is reported in (Li et al. 2016) • Frequent words supply more penalties than rare words • Due to lack of data and data imbalance (Serban et al. 2016) • Softmax Cross-Entropy (SCE) loss is not good because all words • Maximum A-Posteriori (MAP) predicts the most likely • A way to generate unlikely response using Maximum Mutual � 5

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop evaluation. Previous research subtracting a language model term from transduction model term . • Maximum Mutual Information (Li et al. 2016) • MMI-antiLM suppresses language model-like generation by • They used MLE during training and used MMI-antiLM during • In practice, MMI-antiLM generates token y: � 6

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) Softmax Cross-Entropy loss for frequent token classes Proposed method . Club Fight about talk not do You . Club Fight about talk not do You class equally AAAI 2019 DEEP-DIAL workshop Inverse Token Frequency loss • SCE loss treats each token • ITF loss scales smaller loss � 7

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Advantages compared to previous works • No special inference method. You can use common greedy search • ITF loss can be easily incorporated. Just replace loss function! • Training with ITF loss is as stable as training with SCE loss. • ITF models yield state-of-the-art diversity and maintains quality. � 8

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Code examples with PyTorch Inverse Token Frequency ITF loss Very Easy!! SCE loss sce_loss = nn.NLLLoss(weight=None) � 9

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Baselines Evaluation metrics Experiment setups Datasets • OpenSubtitles (En) 5M turns and 0.4M episodes • Twitter (En/Ja) 5M/4.5M turns and 2.5M/0.7M episodes • Seq2Seq 4 layers Bi-LSTM w/ residual connections • Seq2Seq + Attention • Seq2Seq + MMI • MemN2N considering dialogue history using memory • BLEU-1/2 n-grams matching between all hypo. and all ref. • DIST-1/2 distinct n-grams in all generated responses � 10

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Result on OpenSubtitles Ours Previous � 11

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Result on Twitter on Japanese Twitter Dataset Previous Previous Ours Ours • ITF models outperform the MMI on both of BLEU-2 and DIST-1 • ITF model achieves a ground truth-level DIST-1 score of 16.8 � 12

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop A generated sample on OpenSubtitles � 13

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop A generated sample on Twitter � 14

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Summary • SCE loss + MAP prediction => Low diversity => Dull Response • SCE loss + MMI inference => High diversity and good quality • ITF loss + MAP prediction => Very high diversity and good quality � 15

Another Diversity-Promoting Objective Function for Neural Dialogue - PowerPoint PPT Presentation

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Another Diversity-Promoting Objective Function for Neural Dialogue Generation Ryo

PURPOSE The free courses of NASE have a main objective which is promoting the teaching of

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Propagating Error Backward Hyperparameters for Neural Networks } Multi-layer (deep) neural

Lecture 2: Deeper Neural Network 1 Objective In the second lecture, you will see How to

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA Yan Chen, Xiaoshi

& Gender in the Workplace Reimagine an Inclusive Workplace Objective Define Diversity

Promoting equality in mental health Context High ethnicity diversity in Southwark 1) Largest

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

Multiple objective function optimization R.T. Marker, J.S. Arora, Survey of multi-objective

Optimizing the Algorithm Hard-Margin Objective Current objective: want to relax hard

Neural Network-Based Accelerators for Transcendental Function Approximation Schuyler Eldridge

Optimization Most often we will not be handed an objective function which should be optimized.

Bayesian neural networks: a function space view tour Yingzhen Li Microsoft Research Cambridge

Learning Discourse-level Diversity for Neural Dialog Models Using Conditional Variational

Fundamentals of Diversity Reception What is diversity? Diversity is a technique to combine

Neural Circuits Underlie Brain Function interneuron inter- neuron pyramidal neurons Neural

Outline A brief tour of practice diversity Its everywhere you look! Is practice diversity

Conjugate Directions Powells method is based on a model quadratic objective function and

7. Artificial neural networks Introduction to neural networks Despite struggling to understand

American Food Learning Objective: To fjnd out about the diversity of foods grown in Americas

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks

Simulated Annealing Chad Germany

Promoting Functional Mobility Independence and Activity in Mobility Older Adults Activity

Nonparametric regression using deep neural networks with ReLU activation function Johannes

Another Diversity-Promoting Objective Function for Neural Dialogue - PowerPoint PPT Presentation

https://arxiv.org/abs/1811.08100 Another Diversity-Promoting Objective Function for Neural Dialogue Generation (Nakamura et al. 2018) AAAI 2019 DEEP-DIAL workshop Another Diversity-Promoting Objective Function for Neural Dialogue Generation Ryo

PURPOSE The free courses of NASE have a main objective which is promoting the teaching of

Improving Adversarial Robustness via Promoting Ensemble Diversity Tianyu

Propagating Error Backward Hyperparameters for Neural Networks } Multi-layer (deep) neural

Lecture 2: Deeper Neural Network 1 Objective In the second lecture, you will see How to

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA Yan Chen, Xiaoshi

&amp; Gender in the Workplace Reimagine an Inclusive Workplace Objective Define Diversity

Promoting equality in mental health Context High ethnicity diversity in Southwark 1) Largest

Toward Adversarial Robustness by Diversity in an Ensemble of Specialized Deep Neural Networks

Multiple objective function optimization R.T. Marker, J.S. Arora, Survey of multi-objective

Optimizing the Algorithm Hard-Margin Objective Current objective: want to relax hard

Neural Network-Based Accelerators for Transcendental Function Approximation Schuyler Eldridge

Optimization Most often we will not be handed an objective function which should be optimized.

Bayesian neural networks: a function space view tour Yingzhen Li Microsoft Research Cambridge

Learning Discourse-level Diversity for Neural Dialog Models Using Conditional Variational

Fundamentals of Diversity Reception What is diversity? Diversity is a technique to combine

Neural Circuits Underlie Brain Function interneuron inter- neuron pyramidal neurons Neural

Outline A brief tour of practice diversity Its everywhere you look! Is practice diversity

Conjugate Directions Powells method is based on a model quadratic objective function and

7. Artificial neural networks Introduction to neural networks Despite struggling to understand

American Food Learning Objective: To fjnd out about the diversity of foods grown in Americas

CONVOLUTIONAL AND RECURRENT NEURAL NETWORKS Neural networks Fully connected networks

Simulated Annealing Chad Germany

Promoting Functional Mobility Independence and Activity in Mobility Older Adults Activity

Nonparametric regression using deep neural networks with ReLU activation function Johannes

& Gender in the Workplace Reimagine an Inclusive Workplace Objective Define Diversity