Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

• Still difgerent from full semantic alignments • can be obtained by simple string replacement • Can be applied to some or all slots . Introduction . . . . . . . . . . . + they appear verbatim in the outputs Problems We Solve . + added back in post-processing enumerable: food type, price range non-enumerable: rest. name, phone number, postcode 5/ 34 Ondřej Dušek Sequence-to-Sequence NLG inform(direction=“X-dir”, from_stop=“X-from”, line=X-line, vehicle=X-vehicle, departure_time=X-departure) inform(name=“X-name”, good_for_meal=X-meal, kids_allowed=no) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem 1: Gen. from Unaligned Data – Delexicalization • Limitation / way to address data sparsity • many slot values seen once or never in training • restaurant names, departure times → replaced with placeholders for generation Take line X-line X-vehicle at X-departure from X-from direction X-dir . X-name is good for X-meal and no children are allowed.

• Can be applied to some or all slots . . . . . . . . . . . . . Problems We Solve Introduction . + they appear verbatim in the outputs + added back in post-processing enumerable: food type, price range non-enumerable: rest. name, phone number, postcode 5/ 34 Ondřej Dušek Sequence-to-Sequence NLG inform(direction=“X-dir”, from_stop=“X-from”, line=X-line, vehicle=X-vehicle, departure_time=X-departure) inform(name=“X-name”, good_for_meal=X-meal, kids_allowed=no) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem 1: Gen. from Unaligned Data – Delexicalization • Limitation / way to address data sparsity • many slot values seen once or never in training • restaurant names, departure times → replaced with placeholders for generation • Still difgerent from full semantic alignments • can be obtained by simple string replacement Take line X-line X-vehicle at X-departure from X-from direction X-dir . X-name is good for X-meal and no children are allowed.

. . . . . . . . . . . . . . . . Introduction Problems We Solve + they appear verbatim in the outputs + added back in post-processing enumerable: food type, price range non-enumerable: rest. name, phone number, postcode 5/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . Problem 1: Gen. from Unaligned Data – Delexicalization • Limitation / way to address data sparsity • many slot values seen once or never in training • restaurant names, departure times → replaced with placeholders for generation • Still difgerent from full semantic alignments • can be obtained by simple string replacement • Can be applied to some or all slots

• some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . Introduction . . . Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: sentence plan t-tree zone=en MR sentence surface be surface text inform(name=X-name,type=placetoeat, planning realization v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

• two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step sentence plan t-tree zone=en MR sentence surface be surface text inform(name=X-name,type=placetoeat, planning realization v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

• two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step MR joint NLG surface text inform(name=X-name,type=placetoeat, eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

• joint setup avoids error accumulation over a pipeline • we try both in one system + compare . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting

• we try both in one system + compare . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline

. . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 2: Comparing Difgerent NLG Architectures 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 6/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we try both in one system + compare

• entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . Introduction . . . . . Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax

• entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . Problem 3: Adapting to the User (Entrainment) . . Introduction Problems We Solve . 7/ 34 Ondřej Dušek Sequence-to-Sequence NLG how bout the next ride Sorry, I did not find a later option. . . . . . . . . . . . . . . . . . I’m sorry, the next ride was not found. . . . . . . . . . . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax

• natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success

• typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation

• no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account

• no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking

• entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling)

• our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far

. . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 3: Adapting to the User (Entrainment) 7/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts

• vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names • None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . Problems We Solve . . . Introduction . Problem 4: Multilingual NLG lexicalization = copy names from DA to output Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • English: little morphology

• (almost) no morphological agreement • no need to inflect proper names • None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . Introduction . . . . Problems We Solve Problem 4: Multilingual NLG lexicalization = copy names from DA to output Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • English: little morphology • vocabulary size relatively small

• no need to inflect proper names • None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG lexicalization = copy names from DA to output Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement

• None of this works with rich morphology • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG Czech is a good language to try 8/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output

• Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG Czech is a good language to try 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

• Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology → Czech is a good language to try ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

• inflection for lexicalization (surface form selection) . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology → Czech is a good language to try • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

. . . . . . . . . . . . . . . . . . Introduction Problems We Solve Problem 4: Multilingual NLG 8/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . • English: little morphology • vocabulary size relatively small • (almost) no morphological agreement • no need to inflect proper names → lexicalization = copy names from DA to output • None of this works with rich morphology → Czech is a good language to try • Extensions to our generator to address this: • 3rd generator mode: generating lemmas & morphological tags • inflection for lexicalization (surface form selection) ě é Toto se líbí uživateli Jana Nováková. --------- - - [masc] [fem] This is liked by user (name) [dat] [nom] e u Děkujeme, Jan Novák , vaše hlasování Thank you, (name) bylo vytvořeno. [nom] your poll has been created

• learns to produce meaningful outputs from little training data • includes proper name inflection for Czech . Introduction . . . . . . . . . . . Our NLG system Our Solution . trainable from unaligned pairs of input DAs + sentences multiple operating modes for comparison: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) context-aware: adapts to previous user utterance works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models

• includes proper name inflection for Czech . . . . . . . . . . . . . Introduction . Our Solution Our NLG system multiple operating modes for comparison: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) context-aware: adapts to previous user utterance works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data

• includes proper name inflection for Czech . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) context-aware: adapts to previous user utterance works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison:

• includes proper name inflection for Czech . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) works for English and Czech c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison: � context-aware: adapts to previous user utterance

• includes proper name inflection for Czech . . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison: � context-aware: adapts to previous user utterance � works for English and Czech

. . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) c) 3rd generator mode: lemma-tag pairs 9/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models � trainable from unaligned pairs of input DAs + sentences • learns to produce meaningful outputs from little training data � multiple operating modes for comparison: � context-aware: adapts to previous user utterance � works for English and Czech • includes proper name inflection for Czech

. 1. Introduction to the problem . . . . . . . . . . . Basic Sequence-to-Sequence NLG 2. Sequence-to-sequence Generation . a) basic model architecture b) generating directly / via deep syntax trees c) experiments on the BAGEL Set 3. Context-aware extensions (user adaptation/entrainment) b) making the basic seq2seq setup context-aware c) experiments on our dataset 4. Generating Czech b) generator extensions for Czech c) experiments on our dataset 5. Conclusions and future work ideas 10/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . a) our task + problems we are solving a) collecting a context-aware dataset a) creating a Czech NLG dataset

• Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention

• Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states

• attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens

• basic greedy generation . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states

. . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation

. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs 11/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation + reranker ( → )

• we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information

• check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases

• NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 0 1 0 1 1 1 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✗ ✗ ✓ ✗ ✓ penalty=3 X is a restaurant .

• 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer σ lstm lstm lstm lstm lstm X is a restaurant .

• penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre name=X-name area=riverside eattype=bar inform σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm X is a restaurant . • 1-hot DA representation

. . . . . . . . . . . . . . . . . . . System Architecture Reranker 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 0 1 0 1 1 1 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✗ ✗ ✓ ✗ ✓ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

. . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 12/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 0 1 0 1 1 1 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✗ ✗ ✓ ✗ ✓ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

• main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . Basic Sequence-to-Sequence NLG Joint and Two-step Setups System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

• input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . Basic Sequence-to-Sequence NLG Joint and Two-step Setups System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs

• output: • 2-step mode: deep syntax trees post-processed by a surface System Workflow . . . . . . . . . . Basic Sequence-to-Sequence NLG Joint and Two-step Setups . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs

• 2-step mode: deep syntax trees post-processed by a surface . Joint and Two-step Setups . . . . . . . . . . . . System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . ( <root> <root> ( ( X-name n:subj ) be v:fin ( ( Italian adj:attr ) restaurant n:obj ( river n:near+X ) ) ) ) . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

• 2-step mode: deep syntax trees post-processed by a surface . Joint and Two-step Setups . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

. Joint and Two-step Setups . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 13/ 34 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface

• much less data than previous seq2seq methods • partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Experiments on the BAGEL Set . . Basic Sequence-to-Sequence NLG . Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset:

• partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset: • much less data than previous seq2seq methods

• manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”)

• 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it

• manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST

. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 14/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data

19 62.76 5.669 19 5.293 25 5.510 60.93 (beam size 10) 24 5.487 60.77 + Reranker (beam size 5) 28 + Beam search (beam size 100) 58.59 60.44 20 5.144 55.29 Greedy with trees 30 5.231 59.89 Dušek & Jurčíček (2015) (beam size 100) . 5.514 5.507 Sequence-to-Sequence NLG Ondřej Dušek 15/ 34 (beam size 100) 21 5.614 62.40 (beam size 10) 27 61.18 - + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 5.052 52.54 Greedy into strings 0 Mairesse et al. (2010) – alignments . . . . . . . . . . . . . . . . . . . . . . . . . . prev . Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . Setup BLEU NIST ERR ∼ 67

. 5.293 25 5.510 60.93 (beam size 10) 24 5.487 60.77 + Reranker (beam size 5) 28 58.59 60.44 + Beam search (beam size 100) 20 5.144 55.29 Greedy with trees 30 5.231 59.89 Dušek & Jurčíček (2015) 0 (beam size 100) 5.514 Mairesse et al. (2010) – alignments (beam size 10) joint two-step our Sequence-to-Sequence NLG Ondřej Dušek 15/ 34 (beam size 100) 21 5.614 62.40 27 Greedy into strings 5.507 61.18 + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 5.052 52.54 . - prev . . . . . . . . . . . . . . . . . . . . . . . . . . Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . Setup BLEU NIST ERR ∼ 67 19 62.76 5.669 19

. area=riverside, food=French) . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Sample Outputs Input DA inform(name=X-name, type=placetoeat, eattype=restaurant, Reference . X is a French restaurant on the riverside. Greedy with trees X is a restaurant providing french and continental and by the river. + Beam search + Reranker X is a french restaurant in the riverside area. Greedy into strings X is a restaurant in the riverside that serves italian food. [French] + Beam search X is a restaurant in the riverside that serves italian food. [French] + Reranker X is a restaurant in the riverside area that serves french food. 16/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG X is a restaurant that serves french takeaway. [riverside]

. 1. Introduction to the problem . . . . . . . . . . . Entrainment-enabled NLG 2. Sequence-to-sequence Generation . a) basic model architecture b) generating directly / via deep syntax trees c) experiments on the BAGEL Set 3. Context-aware extensions (user adaptation/entrainment) b) making the basic seq2seq setup context-aware c) experiments on our dataset 4. Generating Czech b) generator extensions for Czech c) experiments on our dataset 5. Conclusions and future work ideas 17/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . a) our task + problems we are solving a) collecting a context-aware dataset a) creating a Czech NLG dataset

• Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context

• Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context • Problem: data sparsity

• Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact

• preceding user utterance . . . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek Sequence-to-Sequence NLG inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) . . . . . . . . . . . . . . . . . . . . . . . Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s)

. . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance NEW →

. . . . . . . . . . . . . . . . Entrainment-enabled NLG Adding Entrainment to Trainable NLG 18/ 34 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Heading to Rector Street from Fulton Street, take a bus line M21 at 9:13pm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance CONTEXT- → AWARE

• record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . Entrainment-enabled NLG Collecting a Context-aware Dataset Collecting the set (via CrowdFlower) . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . . 1. Get natural user utterances in calls to a live dialogue system

• manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . Entrainment-enabled NLG Collecting a Context-aware Dataset . task descriptions use varying synonyms . 2. Generate possible response DAs for the user utterances . 3. Collect natural language paraphrases for the response DAs . Collecting the set (via CrowdFlower) . . 19/ 34 Ondřej Dušek Sequence-to-Sequence NLG You want a connection – your departure stop is Marble Hill , and you want to go to Roosevelt Island . Ask how long the journey will take. Ask about a schedule afuerwards. Then modify your query: Ask for a ride at six o’clock in the evening. Ask for a connection by bus. Do as if you changed your mind: Say that your destination stop is City Hall . You are searching for transit options leaving from Houston Street with the destination of Marble Hill . When you are ofgered a schedule, ask about the time of arrival at your destination. Then ask for a connection afuer that. Modify your query: Request information about an alternative at six p.m. and state that you prefer to go by bus. Tell the system that you want to travel from Park Place to Inwood . When you are ofgered a trip, ask about the time needed. Then ask for another alternative. Change your search: Ask about a ride at 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o’clock p.m. and tell the system that you would rather use the bus. 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS,

• using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . Collecting the set (via CrowdFlower) . . Entrainment-enabled NLG Collecting a Context-aware Dataset . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU

• interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . Collecting a Context-aware Dataset . . Entrainment-enabled NLG . Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy

• checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . . . . . . Entrainment-enabled NLG Collecting a Context-aware Dataset Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions

. . . . . . . . . . . . . . . . . Entrainment-enabled NLG Collecting a Context-aware Dataset Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 19/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission)

. . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 20/ 34 . Sequence-to-Sequence NLG . Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

. . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 20/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

. . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture . a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 20/ 34 Ondřej Dušek Sequence-to-Sequence NLG . Context in our Seq2seq Generator (1) . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

. . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (1) . decoder b) separate context encoder, hidden states concatenated 20/ 34 Ondřej Dušek Sequence-to-Sequence NLG . a) preceding user utterance prepended to the DA and fed into the . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

• promoting outputs that have a word or phrase overlap with . . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (2) the context utterance 21/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match

. . . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (2) the context utterance 21/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with

. . . . . . . . . . . . . . . . . Entrainment-enabled NLG System Architecture Context in our Seq2seq Generator (2) the context utterance 21/ 34 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with is there a later time inform_no_match(alternative=next) No route found later , sorry . -2.914 The next connection is not found . -3.544 I m sorry , I can not fi nd a later ride . -3.690 ' -3.836 I can not fi nd the next one sorry . I m sorry , a later connection was not found . -4.003 '

Automatic evaluation results BLEU NIST 69.26 • Human pairwise preference ranking (crowdsourced) • baseline • context-aware preferred in 52.5% cases (significant) . 7.037 66.41 Baseline (context not used) Experiments Experiments 68.68 Entrainment-enabled NLG . . . . n -gram match reranker 63.87 7.577 Prepending context . 6.456 + n -gram match reranker 7.772 Context encoder 63.08 6.818 + n -gram match reranker 69.17 7.596 prepending context + n -gram match reranker 22/ 34 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized

Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Sequence-to-Sequence Natural Language Generation Ondej Duek work done with Filip Jurek Institute of Formal and Applied Linguistics, Charles University, Prague Interaction Lab,

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Generation Andrea Zugarini SAILab December 5th, 2019 LabMeeting, December 5th

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini PyParis 2018

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Neural Generation for Czech: Data and Baselines Ondej Duek & Filip Jurek Institute

@ Supersymmetry stabilizes the EW sector of the SM and is actually supported by data via virtual

Chemnitz University of Technology @ GridCLEF Pilot 2009 Outline Motivation Integrating

Routing in space Cisco Systems space team Lloyd Wood IET seminar on military satellite

elmet Hub Bike Helmet retail and renting machine Its a beautiful day for a bicycle ride.

CLEF 20 th Anniversary Nicola Ferro @frrncl University of Padua, Italy 10 th Conference and Labs

Keyword Weight Propagation for Indexing Structured Web Content Jong Wook Kim, and K. Selcuk

Multiple Jammers and Receivers Using Probability Hypothesis Density Sriramya Ramya

Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Sequence-to-Sequence Natural Language Generation Ondej Duek work done with Filip Jurek Institute of Formal and Applied Linguistics, Charles University, Prague Interaction Lab,

Natural Language Processing 1 Lecture 11: Language generation and summarisation Katia Shutova

Natural Language Generation Andrea Zugarini SAILab December 5th, 2019 LabMeeting, December 5th

Protein Sequence Analysis Protein Sequence Analysis Protein sequence motifs Protein sequence

Natural Language Processing 1 Lecture 10: Language generation and summarisation Katia Shutova

Outline of todays lecture Overview of Natural Language Generation Components of Natural

Let the AI do the Talk Adventures with Natural Language Generation @MarcoBonzanini PyParis 2018

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

SEQUENCE ANALYSIS The term &quot; sequence analysis &quot; in biology implies subjecting a DNA or

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Natural Language Understanding We want to communicate with computers using natural language

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Natural Language Processing: Part II Overview of Natural Language Processing (L90): ACS Lecture

Information Extraction Industrial Natural Language Processing Industrial Natural Language

Neural Generation for Czech: Data and Baselines Ondej Duek &amp; Filip Jurek Institute

@ Supersymmetry stabilizes the EW sector of the SM and is actually supported by data via virtual

Chemnitz University of Technology @ GridCLEF Pilot 2009 Outline Motivation Integrating

Routing in space Cisco Systems space team Lloyd Wood IET seminar on military satellite

elmet Hub Bike Helmet retail and renting machine Its a beautiful day for a bicycle ride.

CLEF 20 th Anniversary Nicola Ferro @frrncl University of Padua, Italy 10 th Conference and Labs

Keyword Weight Propagation for Indexing Structured Web Content Jong Wook Kim, and K. Selcuk

Multiple Jammers and Receivers Using Probability Hypothesis Density Sriramya Ramya

SEQUENCE ANALYSIS The term " sequence analysis " in biology implies subjecting a DNA or

Neural Generation for Czech: Data and Baselines Ondej Duek & Filip Jurek Institute