sequence to sequence natural language generation
play

Sequence-to-Sequence Natural Language Generation Ondej Duek work - PowerPoint PPT Presentation

. . . . . . . . . . . . . . . . Sequence-to-Sequence Natural Language Generation Ondej Duek work done with Filip Jurek at Charles University in Prague November 15, 2016 Interaction Lab meeting 1/ 20 Ondej Duek


  1. • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation

  2. • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account

  3. • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking

  4. • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling)

  5. • our system is trainable and entrains/adapts . . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far

  6. . . . . . . . . . . . . . . . . . . Introduction Problems We Solve Entrainment in Dialogues and NLG 5/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • speakers are influenced by previous utterances • adapting (entraining) to each other • reusing lexicon and syntax • entrainment is natural, subconscious • entrainment helps conversation success • natural source of variation • typical NLG only takes the input DA into account • no way of adapting to user’s way of speaking • no output variance (must be fabricated, e.g., by sampling) • entrainment in NLG limited to rule-based systems so far • our system is trainable and entrains/adapts

  7. • we can compare both approaches in a single architecture . . . . . . . . . . . . . Our Solution Introduction . Our NLG system trainable from unaligned pairs of input DAs + sentences context-aware: adapts to previous user utterance two operating modes: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models

  8. • we can compare both approaches in a single architecture . . . . . . . . . . . . . Introduction . Our Solution Our NLG system context-aware: adapts to previous user utterance two operating modes: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences

  9. • we can compare both approaches in a single architecture . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system two operating modes: a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance

  10. • we can compare both approaches in a single architecture . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance ✓ two operating modes:

  11. • we can compare both approaches in a single architecture . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance ✓ two operating modes:

  12. . . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) learns to produce meaningful outputs from very little training data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance ✓ two operating modes: • we can compare both approaches in a single architecture

  13. . . . . . . . . . . . . . . . . Introduction Our Solution Our NLG system a) generating sentences token-by-token (joint 1-step NLG) b) generating deep syntax trees in bracketed notation (sentence planner stage of traditional NLG pipeline) data 6/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . • based on sequence-to-sequence neural network models ✓ trainable from unaligned pairs of input DAs + sentences ✓ context-aware: adapts to previous user utterance ✓ two operating modes: • we can compare both approaches in a single architecture ✓ learns to produce meaningful outputs from very little training

  14. • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention

  15. • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states

  16. • attention model: weighing encoder hidden states • basic greedy generation . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens

  17. • basic greedy generation . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states

  18. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation

  19. . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs + reranker ( ) 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation

  20. . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Our Seq2seq Generator architecture + beam search, n -best list outputs 7/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . + X is a restaurant . <STOP> lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm lstm att att att att att att inform name X-name inform eattype restaurant <GO> X is a restaurant . • Sequence-to-sequence models with attention • Encoder LSTM RNN: encode DA into hidden states • Decoder LSTM RNN: generate output tokens • attention model: weighing encoder hidden states • basic greedy generation + reranker ( → )

  21. • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information

  22. • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases

  23. • NN with LSTM encoder + sigmoid classification layer • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 1 0 1 1 1 0 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✓ ✗ ✓ ✗ ✗ penalty=3 X is a restaurant .

  24. • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer σ lstm lstm lstm lstm lstm X is a restaurant .

  25. • penalty = Hamming distance from input DA (on 1-hot vectors) . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture Reranker 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre name=X-name area=riverside eattype=bar inform σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm X is a restaurant . • 1-hot DA representation

  26. . . . . . . . . . . . . . . . . . . . System Architecture Reranker 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 1 0 1 1 1 0 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✓ ✗ ✓ ✗ ✗ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

  27. . . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG System Architecture . 8/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Reranker . . . . . . . . . . . . . . . . . . . . . . . . • generator may not cover the input DA perfectly • missing / superfluous information • we would like to penalize such cases • check whether output conforms to the input DA + rerank • NN with LSTM encoder + sigmoid classification layer eattype=restaurant area=citycentre inform(name=X-name,eattype=bar, name=X-name area=riverside eattype=bar area=citycentre) inform 1 0 1 1 1 0 σ 1 1 0 1 0 0 lstm lstm lstm lstm lstm ✓ ✓ ✗ ✓ ✗ ✗ penalty=3 X is a restaurant . • 1-hot DA representation • penalty = Hamming distance from input DA (on 1-hot vectors)

  28. • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Experiments on the BAGEL Set . . Basic Sequence-to-Sequence NLG . Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset:

  29. • partially delexicalized (names, phone numbers • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . Basic Sequence-to-Sequence NLG . . . Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information “X”) (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • BAGEL dataset: • much less data than previous seq2seq methods

  30. • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”)

  31. • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it

  32. • manual evaluation: semantic errors on 20% data . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST

  33. . . . . . . . . . . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Experiments 202 DAs / 404 sentences, restaurant information (missing/irrelevant/repeated) 9/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . • BAGEL dataset: • much less data than previous seq2seq methods • partially delexicalized (names, phone numbers → “X”) • manual alignment provided, but we do not use it • 10-fold cross-validation • automatic metrics: BLEU, NIST • manual evaluation: semantic errors on 20% data

  34. . 28 (beam size 100) 25 5.510 60.93 (beam size 10) 24 5.487 60.77 + Reranker (beam size 5) 5.293 5.514 58.59 + Beam search (beam size 100) 20 5.144 55.29 Greedy with trees 30 5.231 59.89 Dušek & Jurčíček (2015) 60.44 19 - (beam size 10) Sequence-to-Sequence NLG Ondřej Dušek 10/ 20 19 5.669 62.76 (beam size 100) 21 5.614 62.40 27 Greedy into strings 5.507 61.18 + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 5.052 52.54 0 Mairesse et al. (2010) – alignments . . . . . . . . . . . . . . . . . . . . . . . . . . prev . ERR NIST BLEU Setup Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . ∼ 67

  35. . + Reranker (beam size 5) 5.514 60.44 (beam size 100) 25 5.510 60.93 (beam size 10) 24 5.487 60.77 28 Greedy into strings 5.293 58.59 + Beam search (beam size 100) 20 5.144 55.29 Greedy with trees 30 5.231 59.89 19 52.54 0 21 joint two-step our Sequence-to-Sequence NLG Ondřej Dušek 10/ 20 19 5.669 62.76 (beam size 100) 5.614 5.052 62.40 (beam size 10) 27 5.507 61.18 + Reranker (beam size 5) 32 5.228 55.84 + Beam search (beam size 100) 37 Dušek & Jurčíček (2015) - . . . . . . . . . . . . . . . . . . . . . . . . . . . prev . Mairesse et al. (2010) – alignments ERR NIST BLEU Setup Results Experiments on the BAGEL Set Basic Sequence-to-Sequence NLG . . . . . . . . . . . . . . . ∼ 67

  36. . area=riverside, food=French) . . . . . . . . Basic Sequence-to-Sequence NLG Experiments on the BAGEL Set Sample Outputs Input DA inform(name=X-name, type=placetoeat, eattype=restaurant, Reference . X is a French restaurant on the riverside. Greedy with trees X is a restaurant providing french and continental and by the river. + Beam search + Reranker X is a french restaurant in the riverside area. Greedy into strings X is a restaurant in the riverside that serves italian food. [French] + Beam search X is a restaurant in the riverside that serves italian food. [French] + Reranker X is a restaurant in the riverside area that serves french food. 11/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG X is a restaurant that serves french takeaway. [riverside]

  37. • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context

  38. • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Aim: condition generation on preceding context • Problem: data sparsity

  39. • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance . . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact

  40. • preceding user utterance . . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek Sequence-to-Sequence NLG inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) . . . . . . . . . . . . . . . . . . . . . . . . Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s)

  41. . . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Go by the 9:13pm bus on the M21 line from Fulton Street directly to Rector Street . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance NEW →

  42. . . . . . . . . . . . . . . . Entrainment-enabled NLG Introduction Adding Entrainment to Trainable NLG 12/ 20 Ondřej Dušek Sequence-to-Sequence NLG I’m headed to Rector Street inform(from_stop=”Fulton Street”, vehicle=bus, direction=”Rector Street”, departure_time=9:13pm, line=M21) Heading to Rector Street from Fulton Street, take a bus line M21 at 9:13pm. CONTEXT- AWARE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Aim: condition generation on preceding context • Problem: data sparsity • Solution: Limit context to just preceding user utterance • likely to have strongest entrainment impact • Need for context-aware training data: we collected a new set • input DA • natural language sentence(s) • preceding user utterance →

  43. • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . . Collecting the set Collecting the set (via CrowdFlower) . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . 1. Get natural user utterances in calls to a live dialogue system

  44. • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . Collecting the set . task descriptions use varying synonyms . 2. Generate possible response DAs for the user utterances . 3. Collect natural language paraphrases for the response DAs . Collecting the set (via CrowdFlower) . . 13/ 20 Ondřej Dušek Sequence-to-Sequence NLG You want a connection – your departure stop is Marble Hill , and you want to go to Roosevelt Island . Ask how long the journey will take. Ask about a schedule afuerwards. Then modify your query: Ask for a ride at six o’clock in the evening. Ask for a connection by bus. Do as if you changed your mind: Say that your destination stop is City Hall . You are searching for transit options leaving from Houston Street with the destination of Marble Hill . When you are ofgered a schedule, ask about the time of arrival at your destination. Then ask for a connection afuer that. Modify your query: Request information about an alternative at six p.m. and state that you prefer to go by bus. Tell the system that you want to travel from Park Place to Inwood . When you are ofgered a trip, ask about the time needed. Then ask for another alternative. Change your search: Ask about a ride at 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . o’clock p.m. and tell the system that you would rather use the bus. 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS,

  45. • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . Collecting the set (via CrowdFlower) . . . Collecting the set . task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU

  46. • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . Collecting the set . . . . Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy

  47. • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . Collecting the set . . . . Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy

  48. • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission) . . . . . . . . . . . . . . . . Collecting the set Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions

  49. . . . . . . . . . . . . . . . . . Collecting the set Collecting the set (via CrowdFlower) task descriptions use varying synonyms 2. Generate possible response DAs for the user utterances 3. Collect natural language paraphrases for the response DAs 13/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG 1. Get natural user utterances in calls to a live dialogue system • record calls to live Alex SDS, • manual transcription + reparsing using Alex SLU • using simple rule-based bigram policy • interface designed to support entrainment • context at hand • minimal slot description • short instructions • checks: contents + spelling, automatic + manual • ca. 20% overhead (repeated job submission)

  50. . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 14/ 20 . Sequence-to-Sequence NLG . Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

  51. . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (1) a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 14/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

  52. . . . . . . . . . . . . . . . . Collecting the set System Architecture . a) preceding user utterance prepended to the DA and fed into the decoder b) separate context encoder, hidden states concatenated 14/ 20 Ondřej Dušek Sequence-to-Sequence NLG . Context in our Seq2seq Generator (1) . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + a) lstm lstm lstm lstm lstm lstm lstm lstm . You want a later option . <STOP> is there a later option iconfirm alternative next lstm lstm lstm lstm lstm lstm lstm att att att att att att att <GO> You want a later option .

  53. . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (1) . decoder b) separate context encoder, hidden states concatenated 14/ 20 Ondřej Dušek Sequence-to-Sequence NLG . a) preceding user utterance prepended to the DA and fed into the . . . . . . . . . . . . . . . . . . . . . . . . . . • Two direct context-aware extensions: + lstm lstm lstm . You want a later option . <STOP> iconfirm alternative next + + lstm lstm lstm lstm lstm lstm lstm b) att att att att att att att + lstm lstm lstm lstm lstm <GO> You want a later option . is there a later option

  54. • promoting outputs that have a word or phrase overlap with . . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (2) the context utterance 15/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match

  55. . . . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (2) the context utterance 15/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with

  56. . . . . . . . . . . . . . . . . . Collecting the set System Architecture Context in our Seq2seq Generator (2) the context utterance 15/ 20 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . • One (more) reranker: n -gram match • promoting outputs that have a word or phrase overlap with is there a later time inform_no_match(alternative=next) No route found later , sorry . -2.914 -3.544 The next connection is not found . I m sorry , I can not fi nd a later ride . -3.690 ' -3.836 I can not fi nd the next one sorry . I m sorry , a later connection was not found . -4.003 '

  57. • Human pairwise preference ranking (crowdsourced) • baseline • context-aware preferred in 52.5% cases (significant) . Automatic evaluation results 7.037 66.41 Baseline (context not used) NIST BLEU Collecting the set Experiments Experiments 68.68 . . . . n -gram match reranker 63.87 7.577 6.818 Ondřej Dušek 16/ 20 prepending context + n -gram match reranker 7.596 69.17 + n -gram match reranker 63.08 Prepending context Context encoder 7.772 69.26 + n -gram match reranker 6.456 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized

  58. • Human pairwise preference ranking (crowdsourced) • baseline • context-aware preferred in 52.5% cases (significant) . Automatic evaluation results 7.037 66.41 Baseline (context not used) NIST BLEU Collecting the set Experiments Experiments 68.68 . . . . n -gram match reranker 63.87 7.577 6.818 Ondřej Dušek 16/ 20 prepending context + n -gram match reranker 7.596 69.17 + n -gram match reranker 63.08 Prepending context Context encoder 7.772 69.26 + n -gram match reranker 6.456 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized

  59. • context-aware preferred in 52.5% cases (significant) . Experiments 66.41 Baseline (context not used) NIST BLEU Automatic evaluation results Collecting the set Experiments n -gram match reranker . . . . . 7.037 7.577 68.68 63.08 Ondřej Dušek 16/ 20 7.596 69.17 + n -gram match reranker 6.818 Context encoder . 7.772 69.26 + n -gram match reranker 6.456 63.87 Prepending context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized • Human pairwise preference ranking (crowdsourced) • baseline × prepending context + n -gram match reranker

  60. . Experiments 66.41 Baseline (context not used) NIST BLEU Automatic evaluation results Experiments Collecting the set n -gram match reranker . . . . . . 7.037 68.68 . 63.08 Ondřej Dušek 16/ 20 7.596 69.17 + n -gram match reranker 6.818 Context encoder 7.577 7.772 69.26 + n -gram match reranker 6.456 63.87 Prepending context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Dataset: public transport information • 5.5k paraphrases for 1.8k DA-context combinations • delexicalized • Human pairwise preference ranking (crowdsourced) • baseline × prepending context + n -gram match reranker • context-aware preferred in 52.5% cases (significant)

  61. I did not find a bus route. I’m sorry, I cannot find a bus connection. I’m sorry, I cannot find a bus connection. . Collecting the set Input DA is there a later option Context Output Examples Experiments . . Baseline . . . . iconfirm(alternative=next) Prepending context + n -gram match reranker Next connection. n -gram match reranker . Context encoder + n -gram match reranker Context i need to find a bus connection Input DA inform_no_match(vehicle=bus) Baseline No bus found, sorry. n -gram match reranker Prepending context + n -gram match reranker Context encoder + n -gram match reranker 17/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . You want a later connection. You want a later connection. You want a later option.

  62. . Baseline . . . . . . . Collecting the set Experiments Output Examples Context is there a later option Input DA iconfirm(alternative=next) Next connection. . n -gram match reranker Prepending context + n -gram match reranker Context encoder + n -gram match reranker Context i need to find a bus connection Input DA inform_no_match(vehicle=bus) Baseline No bus found, sorry. n -gram match reranker Prepending context + n -gram match reranker Context encoder + n -gram match reranker 17/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . You want a later connection. You want a later connection. You want a later option. I did not find a bus route. I’m sorry, I cannot find a bus connection. I’m sorry, I cannot find a bus connection.

  63. . direction=Cathedral Parkway, from_stop=Bowling Green, . . . . . . . Collecting the set Experiments Output Examples Context i rather take the bus Input DA inform(vehicle=bus, departure_time=8:01am, line=M15) . Parkway at 8:01am. Ondřej Dušek 18/ 20 Parkway. + n -gram match reranker At 8:01am by bus line M15 from Bowling Green to Cathedral Context encoder + n -gram match reranker Baseline Prepending context Parkway. At 8:01am by bus line M15 from Bowling Green to Cathedral n -gram match reranker Parkway. At 8:01am by bus line M15 from Bowling Green to Cathedral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG You can take the M15 bus from Bowling Green to Cathedral

  64. • generates sentences / trees • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . Conclusion . . . Our System… produces valid outputs even with limited training data allows comparing 2-step & joint NLG is 1st trainable & capable of entrainment Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG ✓ works with unaligned data • better than our previous work on the BAGEL set

  65. • generates sentences / trees • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . Conclusion Our System… allows comparing 2-step & joint NLG is 1st trainable & capable of entrainment Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data

  66. • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . Conclusion Our System… is 1st trainable & capable of entrainment Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees

  67. • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline

  68. • Longer context + better n -gram matching • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline • Lexicalized generation

  69. • Integrate into an end-to-end SDS . . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching

  70. . . . . . . . . . . . . . . . . . . Conclusion Our System… Future Ideas 19/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG . . . . . ✓ works with unaligned data • better than our previous work on the BAGEL set ✓ produces valid outputs even with limited training data ✓ allows comparing 2-step & joint NLG • generates sentences / trees ✓ is 1st trainable & capable of entrainment • entrainment better than baseline • Lexicalized generation • Longer context + better n -gram matching • Integrate into an end-to-end SDS

  71. . . . . . . . . . . . . . . . . . Thank you for your attention Download it! Contact me Ondřej Dušek o.dusek@hw.ac.uk EM 1.56 20/ 20 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • Code: bit.ly/tgen_nlg • Dataset: bit.ly/nlgdata

  72. • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: sentence plan t-tree zone=en MR sentence surface be surface text planning realization inform(name=X-name,type=placetoeat, v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

  73. • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step sentence plan t-tree zone=en MR sentence surface be surface text planning realization inform(name=X-name,type=placetoeat, v:fin eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. X-name restaurant n:subj n:obj Italian river adj:attr n:near+X

  74. • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek Sequence-to-Sequence NLG . . . . . . . . . . . . . . . . . . . . . . . . . . . . • NLG pipeline traditionally divided into: • some NLG systems join this into a single step MR joint NLG surface text inform(name=X-name,type=placetoeat, eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

  75. • joint setup avoids error accumulation over a pipeline • we can do both in one system . . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting

  76. • we can do both in one system . . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline

  77. . . . . . . . . . . . . . . . . . . Two-Step and Joint NLG Setups 1. sentence planning – decide on the overall sentence structure 2. surface realization – decide on specific word forms, linearize away from surface grammar 1/ 6 Ondřej Dušek . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG • NLG pipeline traditionally divided into: • some NLG systems join this into a single step • two-step setup simplifies structure generation by abstracting • joint setup avoids error accumulation over a pipeline • we can do both in one system

  78. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river.

  79. • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . System Workflow . . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs

  80. • output: • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . . System Workflow 2-step mode – deep syntax trees, in bracketed format . joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs

  81. • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . . . 2-step mode – deep syntax trees, in bracketed format System Workflow . joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . ( <root> <root> ( ( X-name n:subj ) be v:fin ( ( Italian adj:attr ) restaurant n:obj ( river n:near+X ) ) ) ) . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

  82. • 2-step mode: deep syntax trees post-processed by a surface . . . . . . . . . . . . . 2-step mode – deep syntax trees, in bracketed format System Workflow . joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output:

  83. . . . . . . . . . . . . . System Workflow . 2-step mode – deep syntax trees, in bracketed format joint mode – sentences realizer 2/ 6 Ondřej Dušek Sequence-to-Sequence NLG Encoder Decoder Attention + Beam search + Reranker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . sentence plan t-tree zone=en be our seq2seq v:fin surface generator X-name restaurant realization n:subj n:obj MR surface text Italian river inform(name=X-name,type=placetoeat, adj:attr n:near+X eattype=restaurant, X is an Italian restaurant area=riverside,food=Italian) near the river. • main generator based on sequence-to-sequence NNs • input: tokenized DAs • output: • 2-step mode: deep syntax trees post-processed by a surface

  84. . near X. . . . . . . . Sample Outputs Input DA inform(name=X-name, type=placetoeat, eattype=restaurant, area=citycentre, near=X-near, food=”Chinese takeaway”, food=Japanese) Reference X is a Chinese takeaway and Japanese restaurant in the city centre Greedy with trees . + Beam search Ondřej Dušek 3/ 6 food. [takeaway] X is a japanese restaurant in the city centre near X providing chinese + Reranker area near X. [Japanese, citycentre] centre area near X. [Japanese, Chinese] X is a restaurant ofgering chinese takeaway in the centre of town Greedy into strings ofgers chinese takeaway. X is a restaurant serving japanese food in the centre of the city that + Reranker + Beam search near X. [Japanese] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sequence-to-Sequence NLG X is a restaurant and japanese food and chinese takeaway. X is a restaurant ofgering italian and indian takeaway in the city X is a restaurant that serves fusion chinese takeaway in the riverside

Recommend


More recommend