measuring semantic coherence of a conversation
play

Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , - PowerPoint PPT Presentation

Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , Maarten de Rijke, Michael Cochez, Vadim Savenkov, Axel Polleres 6 JUNE 2018 Semantic coherence An essential property of a conversation, continuity of senses


  1. Measuring Semantic Coherence of a Conversation Svitlana Vakulenko , Maarten de Rijke, Michael Cochez, Vadim Savenkov, Axel Polleres 6 JUNE 2018

  2. Semantic coherence • An essential property of a conversation, “ continuity of senses” https://pixabay.com/en/fishing-net-red-thread-network-node-1526496/ � 2

  3. Research goal ▪ See if we can detect holes in conversations ▪ Evaluate existing knowledge models ▪ Propose an approach to measure these holes (incoherence) ▪ Why : dialogue system design, knowledge engineering � 3

  4. Semantic models ▪ Knowledge Graphs https://commons.wikimedia.org/wiki/File:Wikidata-gun-ontology-2017-05-11.png � 4

  5. Semantic models ▪ Word embeddings https://commons.wikimedia.org/wiki/File:2016_02_mini_embedding.png � 5

  6. Semantic models ▪ Knowledge Graphs ▪ Word embeddings https://commons.wikimedia.org/wiki/File:2016_02_mini_embedding.png https://commons.wikimedia.org/wiki/File:Wikidata-gun-ontology-2017-05-11.png � 6

  7. Semantic models ▪ Knowledge Graphs ▪ Word embeddings ▪ Knowledge Graph embeddings https://commons.wikimedia.org/wiki/File:2016_02_mini_embedding.png https://commons.wikimedia.org/wiki/File:Wikidata-gun-ontology-2017-05-11.png � 7

  8. Linking dialogue ▪ Take existing knowledge models ▪ See if we can detect holes in conversations through this models ▪ Propose an approach to measure these holes (incoherence) https://pxhere.com/en/photo/1101883 � 8

  9. Dialog graph w1 mdg : gksudo gedit /etc/apt/source.list w2 w3 (type from command line) crunchbang666 : the text editor has opened the file source.list but there is no content i typed source instead of sources ... ok so i have it open dbr:Gedit w1 u1 c1 w2 genre wikiPageWikiLink u2 c* c2 dbr:GNOME dbr:Text editor w3 wikiPageWikiLink p1 p2 w4 c3 dbr:Deb(file format) wikiPageWikiLink u3 c4 dbr:Ubuntu(OS) u4 w5 w5 w4 mdg : see the line # deb http://gb.archive. ubuntu w4 all you have to do is delete the ""#"" character crunchbang666 : just the deb or the deb-src line too? � 9 VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION.

  10. Experiments ▪ Ubuntu Dialogue Corpus ▪ DBpedia Spotlight API ▪ Knowledge Graphs: DBpedia+Wikidata HDT ▪ Knowledge Graph embeddings: rdf2vec, KGlove ▪ Word embeddings: word2vec, Glove https://github.com/rkadlec/ubuntu-ranking-dataset-creator 
 https://en.wikipedia.org/wiki/File:DBpediaSpotlight.jpg https://en.wikipedia.org/wiki/Wikidata � 10

  11. Subgraph induction � 11 VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION.

  12. top-k shortest path PREFIX ppf: <java:at.ac.wu.arqext.path.> PREFIX dbr: <http://dbpedia.org/resource/> SELECT * WHERE { ?X ppf:topk ("--source" dbr:Directory_service dbr:Gnome dbr:GNOME dbr:Desktop_environment "--target" dbr:Desktop_computer "--k" 5 "--maxlength" 9 "--timeout" 2000) } http://wikidata.communidata.at � 12

  13. Subgraph statistics � 13 VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION.

  14. Shortest paths � 14 VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION.

  15. Negative sampling ▪ random uniform (RUf) ▪ vocabulary distribution (VoD) ▪ sequence disorder (SqD) ▪ horizontal split (HSp) ▪ vertical split (VSp) � 15

  16. Shortest paths � 16 VAKULENKO ET AL. MEASURING SEMANTIC COHERENCE OF A CONVERSATION.

  17. Binary classification ▪ Convolutional Neural Network (CNN) ▪ Input: sequence of words/entities ▪ Output: coherence score [0;1] Word Convolutional Max pool Hidden Output embeddings 0.8 ReLU ReLU Sigmoid 250 filters size 3 step 1 � 17

  18. Binary classification ▪ Convolutional Neural Network (CNN) ▪ Input: sequence of words/entities ▪ Output: coherence score [0;1] Knowledge Graph Convolutional Max pool Hidden Output embeddings dbr:ubuntu (OS) dbr:desktop dbr:totem dbr:vlc 0.8 dbr:fsck dbr:ext2 dbr:partition ReLU ReLU Sigmoid 250 filters size 3 step 1 � 18

  19. Results � 19

  20. Random uniform � 20

  21. Horizontal split � 21

  22. Semantic spaces � 22

  23. Conclusions and future work ▪ GloVe word embeddings show best performance ▪ integrating heterogenous knowledge sources � 23

  24. Conclusions and future work ▪ NEL is a bottleneck for KG embeddings ▪ End-to-end training (NEL NN-layer) Knowledge Graph Convolutional Max pool Hidden Output embeddings dbr:ubuntu (OS) dbr:desktop dbr:totem dbr:vlc 0.8 dbr:fsck dbr:ext2 dbr:partition ReLU ReLU Sigmoid 250 filters size 3 step 1 � 24

  25. Conclusions and future work ▪ Dialog graph embeddings w1 mdg : gksudo gedit /etc/apt/source.list w2 w3 (type from command line) crunchbang666 : the text editor has opened the file source.list but there is no content i typed source instead of sources ... ok so i have it open dbr:Gedit w1 u1 c1 w2 genre wikiPageWikiLink u2 c* c2 dbr:GNOME dbr:Text editor w3 wikiPageWikiLink p1 p2 w4 c3 dbr:Deb(file format) wikiPageWikiLink u3 c4 dbr:Ubuntu(OS) u4 w5 w5 w4 mdg : see the line # deb http://gb.archive. ubuntu w4 all you have to do is delete the ""#"" character crunchbang666 : just the deb or the deb-src line too? � 25

Recommend


More recommend