building a user centric and content driven socialbot
play

Building A User-Centric and Content-Driven Socialbot Hao Fang Mari - PowerPoint PPT Presentation

Building A User-Centric and Content-Driven Socialbot Hao Fang Mari Ostendorf (Chair) Hannaneh Hajishirzi Committee: Leah M. Ceccarelli (GSR) Eve Riskin Yejin Choi Geoffrey Zweig Agenda o Background o Sounding Board System 2017 Alexa


  1. Building A User-Centric and Content-Driven Socialbot Hao Fang Mari Ostendorf (Chair) Hannaneh Hajishirzi Committee: Leah M. Ceccarelli (GSR) Eve Riskin Yejin Choi Geoffrey Zweig

  2. Agenda o Background o Sounding Board System – 2017 Alexa Prize Winner o A Graph-Based Document Representation for Dialog Control o Multi-Level Evaluation for Socialbot Conversations o Summary and Future Directions 1

  3. Agenda o Background o Sounding Board System – 2017 Alexa Prize Winner o A Graph-Based Document Representation for Dialog Control o Multi-Level Evaluation for Socialbot Conversations o Summary and Future Directions 2

  4. Sci-Fi Movies 3

  5. Daily Life 4

  6. Types of Conversational AI “converse coherently and engagingly with humans Socialbots on popular topics and current events” Task Domain Dialog Definition Coverage Initiative task-oriented single-domain system-initiative non-task-oriented multi-domain user-initiative open-domain mixed-initiative 5

  7. Socialbot Applications o Entertainment, education, healthcare, companionship, … o A conversational gateway to online content Socialbot Conversational User Interface 6

  8. Agenda o Background o Sounding Board System – 2017 Alexa Prize Winner o A Graph-Based Document Representation for Dialog Control o Multi-Level Evaluation for Socialbot Conversations o Summary and Future Directions 7

  9. Design Objectives • Users can control the dialog flow User- and switch topics at any time Centric • Bot responses are adapted to acknowledge user reactions • Content cover the wide range of Content- user interests Driven • Dialog strategies to lead or contribute to the dialog flow 8 8

  10. 2017 Alexa Prize Finals 9

  11. 10

  12. Dialog Control for Many Miniskills? o Greet o List Topics o Tell Fun Facts Conversation o Tell Jokes Activities o Tell Headlines (Miniskills) o Discuss Movies o Personality Test o … 11

  13. Hierarchical Dialog Management o Dialog Context Tracker o dialog state, topic/content/miniskill history, user personality o Master Dialog Manager o miniskill polling o topic and miniskill backoff o Miniskill Dialog Managers o miniskill dialog control as a finite-state machine o retrieve content & build response plan 12

  14. Social Chat Knowledge An important type of social chat knowledge is online content. How to organize content to facilitate the dialog control? A framework that allows dialog control to be defined in a consistent way. 13

  15. Knowledge Graph UT Austin and Google AI use o Nodes machine learning on data from NASA's Kepler Space Telescope o content post (fact, movie, news article, …) to discover an eighth planet o topic (entity or generic topic) circling a distant star. o Relational edges between content topic mention post and topic category tag o topic mention (NER, noun phrase extraction) AI o category tag (Reddit meta-information) science o movie name, genre, director, actor (IMDB) Google o Dialog Control: move along edges astronomy 14

  16. Agenda o Background o Sounding Board System – 2017 Alexa Prize Winner o A Graph-Based Document Representation for Dialog Control o Multi-Level Evaluation for Socialbot Conversations o Summary and Future Directions 15

  17. Graph-Based Motivation Document Representation o Dialog control defined based on moves on the graph o lead the conversation o handle user initiatives o Challenges for unstructured document (e.g., news articles) o not all sentences are equally interesting to a listener Storytelling o need to figure out a coherent presenting order Question Answering & Asking o answer questions about the document o need a smooth transition between sentences Subject Entity o handle entity-based information seeking requests Opinion Comment o handle opinion-seeking requests 16

  18. Graph-Based Document Representation Storytelling Chain Opinion 1 Sent 1 comment Opinion 2 Entity 1 subject Sent 2 Entity 2 Question 1 Sent 3 Entity 3 Question 2 answer Sent 4 Question 3 17

  19. Document Representation Construction Tokenization Text Pre-processing Sentence Split Sentence Node Creation Sentence Filtering Entity Node Creation Part-of-Speech Tagging NLP Tools Subject Edge Creation Constituency Parsing Named Entity Recognition Storytelling Chain Creation Entity Linking Question Generation Coreference Resolution Comment Collection Dependency Parsing 18

  20. Storytelling Chain Creation o Problem formulation the next 𝑂 Sent 1 sentences o context sentence sequence (𝑡 1 , 𝑡 2 , … , 𝑡 𝑀 ) following 𝑡 𝑀 in o candidate sentence set {y 1 , 𝑧 2 , … , 𝑧 𝑂 } the article Sent 2 o candidate sentence chain (𝑧 𝑗 | 𝑡 1 , 𝑡 2 , … , 𝑡 𝑀 ) Sent 3 Binary Label o Data collection: 550 news articles o Train/Validation/Test: 3/1/1 based on article ID ? L=1, N=4 662 1538 Positive L=2, N=3 Negative 865 1064 0 500 1000 1500 2000 2500 Number of Candidate Sentence Chains 19

  21. Model and Features used for ranking o Model: binary logistic regression sentences given 𝑡 1 , 𝑡 2 , … , 𝑡 𝑀 o input: candidate sentence chain (𝑧 𝑗 | 𝑡 1 , 𝑡 2 , … , 𝑡 𝑀 ) 𝑡 1 , 𝑡 2 , … , 𝑡 𝑀 ∈ ℝ [0,1] o output: probability score 𝑡(𝑧 𝑗 o Features TextRank unsupervised summarization on the document 𝐸 o SentImportance: 𝑠(𝑧 𝑗 𝐸 o SentDistance: 𝑒(𝑧 𝑗 𝑡 1 , 𝑡 2 , … , 𝑡 𝑀 = 𝑇𝑓𝑜𝑢𝐽𝑒𝑦(𝑧 𝑗 ) – 𝑇𝑓𝑜𝑢𝐽𝑒𝑦(𝑡 𝑀 ) o SentEmbedding: 𝑓(𝑧 𝑗 ) Pre-trained BERT o ChainEmbedding: 𝑑(𝑧 𝑗 𝑡 1 , 𝑡 2 , … , 𝑡 𝑀 20

  22. Test Set Results 75 next sentence is not always good 73.7 71.9 70.2 69.3 70 66.3 64.8 SentDistance 65 63.2 SentEmbedding 62.3 62.1 % the highest-ranked SentImportance sentence has a 60 ChainEmbedding positive label All 54.7 55 50 L=1, N=4 L=2, N=3 21

  23. Test Set Results sentence embedding alone 75 73.7 may capture some features 71.9 about importance / style 70.2 69.3 70 (e.g., length, informativeness) 66.3 64.8 SentDistance 65 63.2 SentEmbedding 62.3 62.1 % the highest-ranked SentImportance sentence has a 60 ChainEmbedding positive label All 54.7 55 50 L=1, N=4 L=2, N=3 22

  24. Test Set Results 75 sentence importance 73.7 (document context) is 71.9 70.2 very useful 69.3 70 66.3 64.8 SentDistance 65 63.2 SentEmbedding 62.3 62.1 % the highest-ranked SentImportance sentence has a 60 ChainEmbedding positive label All 54.7 55 50 L=1, N=4 L=2, N=3 23

  25. Test Set Results +4.4 dialog context is important 75 73.7 as the chain gets longer 71.9 70.2 69.3 70 +2.7 66.3 64.8 SentDistance 65 63.2 SentEmbedding 62.3 62.1 % the highest-ranked SentImportance sentence has a 60 ChainEmbedding positive label All 54.7 55 50 L=1, N=4 L=2, N=3 24

  26. using all features (2050-dimensional) overfits Test Set Results for L=2 (1239 training samples) 75 73.7 71.9 70.2 69.3 70 66.3 64.8 SentDistance 65 63.2 SentEmbedding 62.3 62.1 % the highest-ranked SentImportance sentence has a 60 ChainEmbedding positive label All 54.7 55 50 L=1, N=4 L=2, N=3 25

  27. Question 1 Question Generation Sent Question 2 Dependency Parsing Universal Dependencies Dependent Selection for Answer Question Interestingness/Importance Question Type Classification Hand-Crafted Decision Tree Clause/Question Planning Template-Based Planning Clause/Question Realization Dependency-Based Realization 26

  28. Question Generation root ccomp case nmod amod punct punct nsubj xcomp cop compound det ccomp mark det nsubj amod ROOT Among leading U.S. carriers , Sprint was the only one to throttle Skype , the study found constituents clause plan /root /root/nsubj /root/ccomp /root/nsubj /root what (found) (study) (one) (study) (found) Question Type (what, whether, who, why, …) 27

  29. Evaluation of Generated Questions o As a transition clause for introducing Sent2 given Sent1 o do you want to know ______? Sent 1 o 4 question generation methods Do you want to know _____? o generic: more about this article Sent 2 o constituency-based (Heilman, 2011) o dependency-based o human-written o Human judgments on question pairs (A, B, cannot tell) o 134 sentences, 5 judgments per pair 28

  30. Overall Quality dependency-based outperforms constituency-based, but does not achieve “human performance” vs. Generic vs. Human Win Tie Loss Win Tie Loss 100% 100% 90% 90% 80% 80% 44 49 59 70% 70% 73 60% 60% 4 50% 50% 7 40% 40% 6 30% 30% 52 9 44 20% 20% 35 10% 10% 18 0% 0% Constituency Dependency Constituency Dependency 29

  31. dependency-based method generates much Informativeness more informative questions (better than human) vs. Generic vs. Human Win Tie Loss Win Tie Loss 100% 100% 21 90% 90% 35 37 80% 80% 3 51 70% 70% 3 7 60% 60% 50% 50% 9 40% 40% 76 62 30% 30% 56 40 20% 20% 10% 10% 0% 0% Constituency Dependency Constituency Dependency 30

  32. Transition Smoothness dialog context is important! vs. Generic vs. Human Win Tie Loss Win Tie Loss 100% 100% 90% 90% 80% 80% 57 58 70% 70% 73 79 60% 60% 50% 50% 5 4 40% 40% 30% 30% 5 20% 38 20% 38 7 22 10% 10% 14 0% 0% Constituency Dependency Constituency Dependency 31

Recommend


More recommend