sequence to sequence natural language generation
play

Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Sequence-to-Sequence Natural Language Generation Ondej Duek work done with Filip Jurek Institute of Formal and Applied Linguistics, Charles University, Prague Interaction Lab,


  1. • Still difgerent from full semantic alignments • can be obtained by simple string replacement • Can be applied to some or all slots . Introduction . . . . . . . . . . . + they appear verbatim in the outputs Problems We Solve . + added back in post-processing enumerable: food type, price range non-enumerable: rest. name, phone number, postcode 5/ 34 Ondřej Dušek Sequence-to-Sequence NLG inform(direction=“X-dir”, from_stop=“X-from”, line=X-line, vehicle=X-vehicle, departure_time=X-departure) inform(name=“X-name”, good_for_meal=X-meal, kids_allowed=no) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem 1: Gen. from Unaligned Data – Delexicalization • Limitation / way to address data sparsity • many slot values seen once or never in training • restaurant names, departure times → replaced with placeholders for generation Take line X-line X-vehicle at X-departure from X-from direction X-dir . X-name is good for X-meal and no children are allowed.

  2. • Still difgerent from full semantic alignments • can be obtained by simple string replacement • Can be applied to some or all slots . Introduction . . . . . . . . . . . + they appear verbatim in the outputs Problems We Solve . + added back in post-processing enumerable: food type, price range non-enumerable: rest. name, phone number, postcode 5/ 34 Ondřej Dušek Sequence-to-Sequence NLG inform(direction=“X-dir”, from_stop=“X-from”, line=X-line, vehicle=X-vehicle, departure_time=X-departure) inform(name=“X-name”, good_for_meal=X-meal, kids_allowed=no) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem 1: Gen. from Unaligned Data – Delexicalization • Limitation / way to address data sparsity • many slot values seen once or never in training • restaurant names, departure times → replaced with placeholders for generation Take line X-line X-vehicle at X-departure from X-from direction X-dir . X-name is good for X-meal and no children are allowed.

  3. • Can be applied to some or all slots . . . . . . . . . . . . . Problems We Solve Introduction . + they appear verbatim in the outputs + added back in post-processing enumerable: food type, price range non-enumerable: rest. name, phone number, postcode 5/ 34 Ondřej Dušek Sequence-to-Sequence NLG inform(direction=“X-dir”, from_stop=“X-from”, line=X-line, vehicle=X-vehicle, departure_time=X-departure) inform(name=“X-name”, good_for_meal=X-meal, kids_allowed=no) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem 1: Gen. from Unaligned Data – Delexicalization • Limitation / way to address data sparsity • many slot values seen once or never in training • restaurant names, departure times → replaced with placeholders for generation • Still difgerent from full semantic alignments • can be obtained by simple string replacement Take line X-line X-vehicle at X-departure from X-from direction X-dir . X-name is good for X-meal and no children are allowed.

  4. . . . . . . . . . . . . . . . . Introduction Problems We Solve + they appear verbatim in the outputs + added back in post-processing enumerable: food type, price range non-enumerable: rest. name, phone number, postcode 5/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . Problem 1: Gen. from Unaligned Data – Delexicalization • Limitation / way to address data sparsity • many slot values seen once or never in training • restaurant names, departure times → replaced with placeholders for generation • Still difgerent from full semantic alignments • can be obtained by simple string replacement • Can be applied to some or all slots

  5. • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . Introduction . . . Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: sentence plan t-tree zone=en MR sentence surface be surface text inform(name=X-name,type=placetoeat, planning realization v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

  6. • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step sentence plan t-tree zone=en MR sentence surface be surface text inform(name=X-name,type=placetoeat, planning realization v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

  7. • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step MR joint NLG surface text inform(name=X-name,type=placetoeat, eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

  8. • joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting

  9. • we try both in one system + compare . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline

  10. . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare

  11. • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . Introduction . . . . . Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax

  12. • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . Problem 3: Adapting to the User (Entrainment) . . Introduction Problems We Solve . 7/ 34 Ondřej Dušek Sequence-to-Sequence NLG how bout the next ride Sorry, I did not find a later option. . . . . . . . . . . . . . . . . . I’m sorry, the next ride was not found. . . . . . . . . . . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax

  13. • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success

  14. • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation

  15. • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account

  16. • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking

  17. • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling)

  18. • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far

  19. . . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts

  20. • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names • None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . Problems We Solve . . . Introduction . Problem 4: Multilingual NLG lexicalization = copy names from DA to output Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • English: little morphology

  21. • (almost) no morphological agreement • no need to inflect proper names • None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . Introduction . . . . Problems We Solve Problem 4: Multilingual NLG lexicalization = copy names from DA to output Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • English: little morphology • vocabulary size relatively small

  22. • no need to inflect proper names • None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG lexicalization = copy names from DA to output Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement

  23. • None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output

  24. • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG Czech is a good language to try 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

  25. • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology → Czech is a good language to try ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

  26. • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology → Czech is a good language to try • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

  27. . . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology → Czech is a good language to try • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

  28. • learns to produce meaningful outputs from little training data • includes proper name inflection for Czech . Introduction . . . . . . . . . . . Our NLG system Our Solution . trainable from unaligned pairs of input DAs + sentences multiple operating modes for comparison: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) context-aware: adapts to previous user utterance works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models

  29. • includes proper name inflection for Czech . . . . . . . . . . . . . Introduction . Our Solution Our NLG system multiple operating modes for comparison: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) context-aware: adapts to previous user utterance works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data

  30. • includes proper name inflection for Czech . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) context-aware: adapts to previous user utterance works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison:

  31. • includes proper name inflection for Czech . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) context-aware: adapts to previous user utterance works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison:

  32. • includes proper name inflection for Czech . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison: � context-aware: adapts to previous user utterance

  33. • includes proper name inflection for Czech . . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison: � context-aware: adapts to previous user utterance � works for English and Czech

  34. • includes proper name inflection for Czech . . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison: � context-aware: adapts to previous user utterance � works for English and Czech

  35. . . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison: � context-aware: adapts to previous user utterance � works for English and Czech • includes proper name inflection for Czech

  36. . 1. Introduction to the problem . . . . . . . . . . . Basic Sequence-to-Sequence NLG 2. Sequence-to-sequence Generation . a) basic model architecture b) generating directly / via deep syntax trees c) experiments on the BAGEL Set 3. Context-aware extensions (user adaptation/entrainment) b) making the basic seq2seq setup context-aware c) experiments on our dataset 4. Generating Czech b) generator extensions for Czech c) experiments on our dataset 5. Conclusions and future work ideas 10/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . a) our task + problems we are solving a) collecting a context-aware dataset a) creating a Czech NLG dataset

  37. • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention

  38. • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states

  39. • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens

  40. • basic greedy generation . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states

  41. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation

  42. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation

  43. . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation + reranker ( → )

  44. • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information

  45. • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases

  46. • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 0 1 0 1 1 1 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✗ ✗ ✓ ✗ ✓ penalty=3 X is a restaurant .

  47. • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer σ lstm lstm lstm lstm lstm X is a restaurant .

  48. • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre name=X-name area=riverside eattype=bar inform σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm X is a restaurant . • 1-hot DA representation

  49. . . . . . . . . . . . . . . . . . . . System Architecture Reranker 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 0 1 0 1 1 1 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✗ ✗ ✓ ✗ ✓ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

  50. . . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 0 1 0 1 1 1 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✗ ✗ ✓ ✗ ✓ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

  51. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . Basic Sequence-to-Sequence NLG Joint and Two-step Setups System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

  52. • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . Basic Sequence-to-Sequence NLG Joint and Two-step Setups System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs

  53. • output: • 2-step mode: deep syntax trees post-processed by a surface System Workflow . . . . . . . . . . Basic Sequence-to-Sequence NLG Joint and Two-step Setups . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs

  54. • 2-step mode: deep syntax trees post-processed by a surface . Joint and Two-step Setups . . . . . . . . . . . . System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . ( <root> <root> ( ( X-name n:subj ) be v:fin ( ( Italian adj:attr ) restaurant n:obj ( river n:near+X ) ) ) ) . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

  55. • 2-step mode: deep syntax trees post-processed by a surface . Joint and Two-step Setups . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

  56. . Joint and Two-step Setups . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface

  57. • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Experiments on the BAGEL Set . . Basic Sequence-to-Sequence NLG . Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset:

  58. • partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset: • much less data than previous seq2seq methods

  59. • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”)

  60. • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it

  61. • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST

  62. . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data

  63. 19 62.76 5.669 19 5.293 25 5.510 60.93 (beam size 10) 24 5.487 60.77 + Reranker (beam size 5) 28 + Beam search (beam size 100) 58.59 60.44 20 5.144 55.29 Greedy with trees 30 5.231 59.89 Dušek & Jurčíček (2015) (beam size 100) . 5.514 5.507 Sequence-to-Sequence NLG Ondřej Dušek 15/ 34 (beam size 100) 21 5.614 62.40 (beam size 10) 27 61.18 - + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 5.052 52.54 Greedy into strings 0 Mairesse et al. (2010) – alignments . . . . . . . . . . . . . . . . . . . . . . . . . . prev . Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . Setup BLEU NIST ERR ∼ 67

  64. . 5.293 25 5.510 60.93 (beam size 10) 24 5.487 60.77 + Reranker (beam size 5) 28 58.59 60.44 + Beam search (beam size 100) 20 5.144 55.29 Greedy with trees 30 5.231 59.89 Dušek & Jurčíček (2015) 0 (beam size 100) 5.514 Mairesse et al. (2010) – alignments (beam size 10) joint two-step our Sequence-to-Sequence NLG Ondřej Dušek 15/ 34 (beam size 100) 21 5.614 62.40 27 Greedy into strings 5.507 61.18 + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 5.052 52.54 . - prev . . . . . . . . . . . . . . . . . . . . . . . . . . Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . Setup BLEU NIST ERR ∼ 67 19 62.76 5.669 19

  65. . area=riverside, food=French) . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Sample Outputs Input DA inform(name=X-name, type=placetoeat, eattype=restaurant, Reference . X is a French restaurant on the riverside. Greedy with trees X is a restaurant providing french and continental and by the river. + Beam search + Reranker X is a french restaurant in the riverside area. Greedy into strings X is a restaurant in the riverside that serves italian food. [French] + Beam search X is a restaurant in the riverside that serves italian food. [French] + Reranker X is a restaurant in the riverside area that serves french food. 16/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG X is a restaurant that serves french takeaway. [riverside]

  66. . 1. Introduction to the problem . . . . . . . . . . . Entrainment-enabled NLG 2. Sequence-to-sequence Generation . a) basic model architecture b) generating directly / via deep syntax trees c) experiments on the BAGEL Set 3. Context-aware extensions (user adaptation/entrainment) b) making the basic seq2seq setup context-aware c) experiments on our dataset 4. Generating Czech b) generator extensions for Czech c) experiments on our dataset 5. Conclusions and future work ideas 17/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . a) our task + problems we are solving a) collecting a context-aware dataset a) creating a Czech NLG dataset

  67. • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context

  68. • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context • Problem: data sparsity

  69. • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact

  70. • preceding user utterance . . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek Sequence-to-Sequence NLG inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) . . . . . . . . . . . . . . . . . . . . . . . Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s)

  71. . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance NEW →

  72. . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Heading to Rector Street from Fulton Street, take a bus line M21 at 9:13pm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance CONTEXT- → AWARE

  73. • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . Entrainment-enabled NLG Collecting a Context-aware Dataset Collecting the set (via CrowdFlower) . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . . 1. Get natural user utterances in calls to a live dialogue system

  74. • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . Entrainment-enabled NLG Collecting a Context-aware Dataset . task descriptions use varying synonyms . 2. Generate possible response DAs for the user utterances . 3. Collect natural language paraphrases for the response DAs . Collecting the set (via CrowdFlower) . . 19/ 34 Ondřej Dušek Sequence-to-Sequence NLG You want a connection – your departure stop is Marble Hill , and you want to go to Roosevelt Island . Ask how long the journey will take. Ask about a schedule afuerwards. Then modify your query: Ask for a ride at six o’clock in the evening. Ask for a connection by bus. Do as if you changed your mind: Say that your destination stop is City Hall . You are searching for transit options leaving from Houston Street with the destination of Marble Hill . When you are ofgered a schedule, ask about the time of arrival at your destination. Then ask for a connection afuer that. Modify your query: Request information about an alternative at six p.m. and state that you prefer to go by bus. Tell the system that you want to travel from Park Place to Inwood . When you are ofgered a trip, ask about the time needed. Then ask for another alternative. Change your search: Ask about a ride at 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o’clock p.m. and tell the system that you would rather use the bus. 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS,

  75. • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . Collecting the set (via CrowdFlower) . . Entrainment-enabled NLG Collecting a Context-aware Dataset . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU

  76. • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . Collecting a Context-aware Dataset . . Entrainment-enabled NLG . Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy

  77. • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . Collecting a Context-aware Dataset . . Entrainment-enabled NLG . Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy

  78. • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . . . . . . Entrainment-enabled NLG Collecting a Context-aware Dataset Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions

  79. . . . . . . . . . . . . . . . . . Entrainment-enabled NLG Collecting a Context-aware Dataset Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission)

  80. . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 20/ 34 . Sequence-to-Sequence NLG . Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

  81. . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 20/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

  82. . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture . a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 20/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Context in our Seq2seq Generator (1) . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

  83. . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (1) . decoder b) separate context encoder, hidden states concatenated 20/ 34 Ondřej Dušek Sequence-to-Sequence NLG . a) preceding user utterance prepended to the DA and fed into the . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

  84. • promoting outputs that have a word or phrase overlap with . . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (2) the context utterance 21/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match

  85. . . . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (2) the context utterance 21/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with

  86. . . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (2) the context utterance 21/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with is there a later time inform_no_match(alternative=next) No route found later , sorry . -2.914 The next connection is not found . -3.544 I m sorry , I can not fi nd a later ride . -3.690 ' -3.836 I can not fi nd the next one sorry . I m sorry , a later connection was not found . -4.003 '

  87. Automatic evaluation results BLEU NIST 69.26 • Human pairwise preference ranking (crowdsourced) • baseline • context-aware preferred in 52.5% cases (significant) . 7.037 66.41 Baseline (context not used) Experiments Experiments 68.68 Entrainment-enabled NLG . . . . n -gram match reranker 63.87 7.577 Prepending context . 6.456 + n -gram match reranker 7.772 Context encoder 63.08 6.818 + n -gram match reranker 69.17 7.596 prepending context + n -gram match reranker 22/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized

Recommend


More recommend