building adaptable and scalable natural language
play

Building Adaptable and Scalable Natural Language Generation Systems - PowerPoint PPT Presentation

Building Adaptable and Scalable Natural Language Generation Systems Yannis Konstas Natural Language Generation is everywhere (Machine Translation)


  1. Sequence to sequence model input output Encoder Decoder a I know inhabit Attention planet The knew inhabited man A planet was … … … … … know ARG0 I ARG1 ( planet ARG1-of inhabit Y w i | w <i , h ( s ) � � w = argmax ˆ p <s> w i I know the planet of

  2. Linearization Graph —> Depth First Search hold ARG0 ARG1 time location person city date-entity meet ARG0-of ARG0 year month name have-role person ARG1 ARG2 2002 1 “New York” ARG1-of ARG2-of official country name expert group “United States” US officials held an expert group meeting in January 2002 in New York .

  3. Linearization Graph —> Depth First Search hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity 2002 1) expert group :location New_York “United States” US officials held an expert group meeting in January 2002 in New York .

  4. Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  5. Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  6. Encoding Linearize —> RNN encoding Token embeddings - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

  7. Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

  8. Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

  9. Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - [ ] [ ] [ ] [ ] [ ] h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

  10. Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - [ ] [ ] [ ] [ ] [ ] h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

  11. Decoding RNN Encoding —> RNN Decoding (Beam search) h 1 h N(s)

  12. Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) - h 1 h N(s) ∅

  13. Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) Holding - Held softmax - US … h 1 h N(s) ∅

  14. Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) Holding a - Held the softmax - US meeting w i | w <i , h ( s ) � � … … - p h 1 h 2 h N(s) ∅ w 11: Holding Helds w 12: Hold w 13: US w 14: …

  15. Decoding RNN Encoding —> RNN Decoding (Beam search) US init h ( s ) Holding a - person Held the softmax expert - US meeting … w i | w <i , h ( s ) � � … … - p … h 1 h 2 h 3 h N(s) ∅ w 11: Holding Hold a w 21: Helds w 12: w 22: Hold the Hold w 13: Held a w 23: US w 14: Held the w 24: … …

  16. Decoding RNN Encoding —> RNN Decoding (Beam search) US init h ( s ) Holding a meeting - person Held the meetings softmax expert - US meeting meet … w i | w <i , h ( s ) � � … … … - p … h 1 h 2 h 3 h k h N(s) ∅ officials held w k1: The US w 11: Holding Hold a w 21: officials held a US w k2: Helds w 12: w 22: Hold the Hold w 13: Held a w 23: US officials hold the w k3: US w 14: Held the w 24: officials will hold a w k4: US … … …

  17. Attention a the meeting … h 2 h 3 w 2 : held

  18. Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) h 5(s) ARG0-of

  19. Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) h 5(s) ARG0-of h ( s ) , h i � � �� a i = soft max f i a ij h ( s ) X c i = j i

  20. Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002

  21. Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002

  22. Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002

  23. Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002

  24. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United States” US officials held an expert group meeting in January 2002 in New York .

  25. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  26. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  27. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  28. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York . loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .

  29. Experimental Setup AMR LDC2015E86 (SemEval-2016 Task 8) ‣ Hand annotated MR graphs: newswire, forums ‣ ~16k training / 1k development / 1k test pairs Train ‣ Optimize cross-entropy loss Evaluation ‣ BLEU n-gram precision (Papineni et al., ACL 2002)

  30. First Attempt TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  31. First Attempt TreeToStr TSP PBMT NNLG 29 26.9 23.2 23 22.4 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  32. First Attempt TreeToStr TSP PBMT NNLG 29 26.9 23.2 23 22.4 22 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  33. First Attempt TreeToStr TSP PBMT NNLG 29 All systems use a 26.9 23.2 Language Model 23 22.4 22 trained on a very 17.4 large corpus. BLEU 11.6 We will emulate via data augmentation . 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 (Sennrich et al., ACL 2016) PBMT : Pourdamaghani and Knight, INLG 2016

  34. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26%

  35. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition 74.85% 44.26%

  36. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition ‣ Coverage 74.85% 44.26%

  37. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens 74.85% 9000 44.26% 4500 0

  38. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens b) Avg sent length: 20 words 74.85% 9000 c) Limited Language 44.26% Modeling capacity 4500 0

  39. Data Augmentation Original Dataset: ~16k graph-sentence pairs

  40. Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only*

  41. Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap Original Giga-200k Giga-2M 80 60 % 40 20 0 OOV@1 OOV@5

  42. Data Augmentation graph Generate from MR graph text Encoder Decoder text Attention

  43. Data Augmentation graph Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

  44. Data Augmentation graph Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

  45. Data Augmentation graph Re-train Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

  46. Data Augmentation input Generate Parse to Input from Input text

  47. Paired Training

  48. Paired Training ( , ) Train MR Parser P on Original Dataset

  49. Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N

  50. Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword

  51. Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P

  52. Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i

  53. Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i

  54. Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i ( , ) Train Generator G on S N

  55. Training MR Parser Train P on Original Dataset

  56. Training MR Parser Train P on Original Dataset

  57. Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword 200k

  58. Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P 200k 200k

  59. Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P Train P on S 1 = 200k 200k 200k 200k

  60. Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword Fine-tune P on ( , ) Parse S 1 with P Original Dataset Train P on S 1 = 200k 200k 200k 200k

  61. Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 200k 200k 200k

  62. Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 2M 2M 200k 2M

  63. Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 200k 2M

  64. Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 2M G G 2M

  65. Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 2M G G 2M

  66. Final Results TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  67. Final Results TreeToStr TSP PBMT NNLG NNLG-200k NNLG-2M NNLG-20M 35 28 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Recommend


More recommend