Neural Response Ranking for Social Conversation: A Data-Efficient - PowerPoint PPT Presentation

Neural Response Ranking for Social Conversation: A Data-Efficient Approach Igor Shalyminov, Ond ř ej Dušek, and Oliver Lemon School of Mathematical and Computer Sciences Heriot-Watt University 31 October 2018

Outline Introduction. Amazon Alexa Prize ● Alana, HWU's Alexa Prize socialbot ● Training signals in conversational data ● Alana's neural response ranker ● Data efficiency aspect of the response ranker ● Results and discussion ● 2

Amazon Alexa Prize Goal : long, coherent, and engaging conversations with the socialbot ● Grand prize ($1,000,000) - for a 20 minute-long engaging conversation ● 2017: >100 applications → 12 teams competing → HWU took 3rd place ● 2018: >200 applications → 8 teams competing → Finals Nov 2018! ● 3

Alana, An Ensemble Dialogue Model with Ranking Alana: Hi, [...] What would you like to talk about? User: Music Alana: Persona Great! Who is your favorite singer? User: Bob Dylan Alana: Evi + driver Bob Dylan is an American songwriter, singer, painter, and writer. What are your opinions on Bob Dylan? User: When was he born? Alana: Evi + anaphora resolution Bob Dylan was born on Saturday May 24th 1941. User: I love him! Alana: Persona Great! What else do you like? User: Cooking Alana: Facts I heard that every day, Americans eat an estimated 18 acres of pizza. Rating: 2, Length: 11 Papaioannou et al. Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback, 2017 4

Training Signals in Conversational Data Signals correlation study (Pearson coefficient) User feedback that's pretty cool User feedback (positive) User feedback (negative) you're funny thanks buddy Rating 0.11 0.04 you're smart Length 0.67 0.49 wow interesting stop your stupid you're bad Length really harsh stop stop stop telling this Rating 0.11 stupid stuff 5

Alana's Neural Response Ranker Target score is a point-wise prediction for a <context, response_candidate> pair GRU with word_bot and NE features encodes context utterances (3 previous user+system turns) and response candidates Predictor MLP augments the representation with additional dialogue features and outputs the target value 6

Training Details Alana: Hi, [...] What would you like to talk about? User: Music dialog context Alana: [Persona] Great! Who is your favorite singer? User: Bob Dylan Alana: [Evi + driver] Bob Dylan is an American songwriter, singer, painter, and response candidate writer. What are your opinions on Bob Dylan? User: When was he born? target rating Alana: [Evi + anaphora resolution] Bob Dylan was born on Saturday May 24th 1941. User: I love him! 0 1 Alana: [Persona] Great! What else do you like? User: Cooking target length Alana: [Facts] I heard that every day, Americans eat an estimated 18 acres of pizza. 0 1 Rating: 2, Length: 11 7

Evaluation Details Alana: Hi, [...] What would you like to talk about? User: Music dialog context Alana: [Persona] Great! Who is your favorite singer? User: Bob Dylan Alana: [Evi + driver] Bob Dylan is an American gold response songwriter, singer, painter, and writer. target score: 1.0 What are your opinions on Bob Dylan? User: You're so smart! When was he born? Alana: [Coherence] So, talking about movies, What User's feedback famous actor or actress would you like to random response meet? I would love to meet Will Smith . target score: 0.0 He’s just so funny! 8

Interim Results Ranker Precision@1 Training stage Handcrafted 0.478 Trainset size: 500,000 turns (for each target) Linear@length 1 0.742 Linear@rating 1 0.773 DualEncoder@length 2 0.365 Evaluation stage DualEncoder@rating 2 0.584 Eval set: ~24,000 tuples of the form <context, gold answer, fake answer, target> Neural@length 0.824 Gold answers - those followed by explicit Neural@rating 0.847 positive user feedback (prev. slide) [1] VowpalWabbit library [2] Lu et al. A practical approach to dialogue response generation in closed domains, 2017 9

Results on Extended Datasets 10

Discussion User ratings are very sparse and noisy, and expensive to obtain ● Length can be a proxy for user engagement ● A deep learning-based response ranker introduced ● Ranking performance is superior to both handcrafted baseline and a perceptron-based ○ (VowpalWabbit) Training from two supervision signals explored ○ Given a large amount conversational data, user ratings collection can be ● avoided if optimizing for user engagement 11

Thank you! { is33, o.dusek, o.lemon } @hw.ac.uk bit.ly/alana_learning_to_rank @alanathebot 12

References 1. Papaioannou et al. Alana: Social Dialogue using an Ensemble Model and a Ranker trained on User Feedback, 2017 2. Lu et al. A practical approach to dialogue response generation in closed domains, 2017 3. Venkatesh et al. On Evaluating and Comparing Conversational Agents, 2017 13

Neural Response Ranking for Social Conversation: A Data-Efficient - PowerPoint PPT Presentation

Neural Response Ranking for Social Conversation: A Data-Efficient Approach Igor Shalyminov, Ond ej Duek, and Oliver Lemon School of Mathematical and Computer Sciences Heriot-Watt University 31 October 2018 Outline Introduction. Amazon

Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of

Exemplar Encoder Decoder for Neural Conversation Generation By Gaurav Pandey, Danish

Overview Understanding the neural code Neural Encoding Encoding: Prediction of neural response to

Tutorial: TF-Ranking for sparse features Tutorial: TF-Ranking for sparse features This tutorial

Privacy-aware Document Ranking with Neural Signals Jinjin Shao, Shiyu Ji, Tao Yang Department of

THE USE OF RANKINGS IN SOCIAL POLICY ANALYSIS The ranking of individuals, states, and nations

The World Banks COVID-19 response and what that means for procurement A conversation with

Ordinal social ranking : simulations for CP-majority rule Nicolas Fayard 1 and Meltem ztrk 1 1

Social Media Whats it all about? Social media is about the conversation, which only works if you

Ranking Emotional Attributes With Deep Neural Networks Srinivas Parthasarathy, Reza Lotfian and

conversation Key areas for exploration: The Hertfordshire FE response Hertfordshire Local

Kernel Principal Component Ranking: Robust Ranking on Noisy Data Evgeni Tsivtsivadze Botond

DialogueGCN: A Graph Convolutional Neural Network for Emotion Recognition in Conversation

Deep Neural Ranking Models for Argument Retrieval Masters Thesis by Saeed Entezari Referees:

Response Manager Data Management Region 6 Louisiana Flood Response Response Manager EPA

The World of Social Media Social media is a conversation online. Look whos talking: Students

Fairness and Transparency in Ranking Carlos Castillo / UPF chato@acm.org Data and Algorithmic

Learning to Control the Specificity in Neural Response Generation Ruqing Zhang , Jiafeng Guo,

A Few Bad Votes Too Many? Towards Robust Ranking in Social Media Jiang Bian Georgia Tech

Grounding Neural Conversation Models into the Real World Michel Galley SCAI October 1 st , 2017

Ranking in Geo-Tagged Social Media Zhijun Yin 1 , Liangliang Cao 1 , Jiawei Han 1 , Jiebo Luo 2 ,

The Race Conversation Keele Counselling Conference Eugene Ellis The Race Conversation We are

DATA MINING LECTURE 11 Link Analysis Ranking PageRank -- Random walks HITS Absorbing Random

Next Utterance Ranking Based On Context Response Similarity Basma El Amel Boussaha, Nicolas