neural amr sequence to sequence models for parsing and
play

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - PowerPoint PPT Presentation

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation annis Konstas joint work with Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer AMR graph Generate from AMR graph text Decoder Encoder text Attention AMR


  1. Linearization Graph —> Depth First Search (Human-authored annotation) hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official official country ) name :time (date-entity 2002 1) expert group :location New_York “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  2. Linearization Graph —> Depth First Search (Human-authored annotation) hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official official country ) name :time (date-entity 2002 1) expert group :location New_York “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  3. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United States” US officials held an expert group meeting in January 2002 in New York .

  4. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  5. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  6. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

  7. Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York . loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .

  8. Experimental Setup AMR LDC2015E86 (SemEval-2016 Task 8) ‣ Hand annotated MR graphs: newswire, forums ‣ ~16k training / 1k development / 1k test pairs Train ‣ Optimize cross-entropy loss Evaluation ‣ BLEU n-gram precision ( Generation ) (Papineni et al., 2002) ‣ SMATCH score ( Parsing ) (Cai and Knight, 2013)

  9. Experiments ‣ Vanilla experiment ‣ Limited Language Model Capacity ‣ Paired Training ‣ Data augmentation algorithm

  10. First Attempt ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  11. First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  12. First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 22 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  13. First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 All systems use a 26.9 23.2 Language Model 23 22.4 22 trained on a very 17.4 large corpus. BLEU 11.6 We will emulate via data augmentation . 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 (Sennrich et al., ACL 2016) PBMT : Pourdamaghani and Knight, INLG 2016

  14. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26%

  15. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition 74.85% 44.26%

  16. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition ‣ Coverage 74.85% 44.26%

  17. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens 74.85% 9000 44.26% 4500 0

  18. What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens b) Avg sent length: 20 words 74.85% 9000 c) Limited Language 44.26% Modeling capacity 4500 0

  19. Data Augmentation Original Dataset: ~16k graph-sentence pairs

  20. Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only*

  21. Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap Original Giga-200k Giga-2M Giga-20M 80 60 % 40 20 0 OOV@1 OOV@5

  22. Data Augmentation graph Generate from AMR graph text Encoder Decoder text Attention

  23. Data Augmentation graph Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

  24. Data Augmentation graph Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

  25. Data Augmentation graph Re-train Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

  26. Semi-supervised Learning ‣ Self-training ‣ McClosky et al. 2006 ‣ Co-training ‣ Yarowski 1995, Blum and Mitchell 1998, Sarkar 2001 ‣ Sogaard and Rishoj, 2010

  27. Paired Training

  28. Paired Training ( , ) Train AMR Parser P on Original Dataset

  29. Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N

  30. Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword

  31. Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P

  32. Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i

  33. Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i

  34. Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i ( , ) Train Generator G on S N

  35. Training AMR Parser Train P on Original Dataset

  36. Training AMR Parser Train P on Original Dataset

  37. Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword 200k

  38. Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P 200k 200k

  39. Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P Train P on S 1 = 200k 200k 200k 200k

  40. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword Fine-tune P on ( , ) Parse S 1 with P Original Dataset Train P on S 1 = 200k 200k 200k 200k

  41. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 200k 200k 200k

  42. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 2M 2M 200k 2M

  43. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 2M 2M

  44. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 20M 20M 20M

  45. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 20M 20M 20M

  46. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 2M 20M 20M 20M

  47. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 20M 20M 20M G G 20M

  48. Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 20M 20M 20M G G 20M

  49. Final Results ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  50. Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  51. Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  52. Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  53. Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  54. Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  55. Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

  56. Final Results ( Parsing ) SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

  57. Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

  58. Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

  59. Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

  60. Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

  61. How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26% Errors : Disfluency Coverage

  62. How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 Reference Prediction The report stated British government must The report stated that the Britain government help to stabilize weak states and push for must help stabilize the weak states and push international regulations that would stop international regulations to stop the use of freely 74.85% terrorists using freely available information to available information to create a form of new create and unleash new forms of biological biological warfare such as the modified version warfare such as a modified version of the 44.26% of the influenza . influenza virus . Errors : Disfluency Coverage

  63. Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net

  64. Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net thank-01 ARG1 Thank You you

  65. Bonus Slides

  66. Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  67. Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

  68. Encoding Linearize —> RNN encoding Token embeddings - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

  69. Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Recommend


More recommend