Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - PowerPoint PPT Presentation

Linearization Graph —> Depth First Search (Human-authored annotation) hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official official country ) name :time (date-entity 2002 1) expert group :location New_York “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .

Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York . loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .

Experimental Setup AMR LDC2015E86 (SemEval-2016 Task 8) ‣ Hand annotated MR graphs: newswire, forums ‣ ~16k training / 1k development / 1k test pairs Train ‣ Optimize cross-entropy loss Evaluation ‣ BLEU n-gram precision ( Generation ) (Papineni et al., 2002) ‣ SMATCH score ( Parsing ) (Cai and Knight, 2013)

Experiments ‣ Vanilla experiment ‣ Limited Language Model Capacity ‣ Paired Training ‣ Data augmentation algorithm

First Attempt ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 22 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 All systems use a 26.9 23.2 Language Model 23 22.4 22 trained on a very 17.4 large corpus. BLEU 11.6 We will emulate via data augmentation . 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 (Sennrich et al., ACL 2016) PBMT : Pourdamaghani and Knight, INLG 2016

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition ‣ Coverage 74.85% 44.26%

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens 74.85% 9000 44.26% 4500 0

What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens b) Avg sent length: 20 words 74.85% 9000 c) Limited Language 44.26% Modeling capacity 4500 0

Data Augmentation Original Dataset: ~16k graph-sentence pairs

Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only*

Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap Original Giga-200k Giga-2M Giga-20M 80 60 % 40 20 0 OOV@1 OOV@5

Data Augmentation graph Generate from AMR graph text Encoder Decoder text Attention

Data Augmentation graph Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

Data Augmentation graph Re-train Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention

Semi-supervised Learning ‣ Self-training ‣ McClosky et al. 2006 ‣ Co-training ‣ Yarowski 1995, Blum and Mitchell 1998, Sarkar 2001 ‣ Sogaard and Rishoj, 2010

Paired Training

Paired Training ( , ) Train AMR Parser P on Original Dataset

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i

Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i ( , ) Train Generator G on S N

Training AMR Parser Train P on Original Dataset

Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword 200k

Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P 200k 200k

Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P Train P on S 1 = 200k 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword Fine-tune P on ( , ) Parse S 1 with P Original Dataset Train P on S 1 = 200k 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 200k 200k 200k

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 2M 2M 200k 2M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 2M 2M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 20M 20M 20M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 2M 20M 20M 20M

Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 20M 20M 20M G G 20M

Final Results ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016

Final Results ( Parsing ) SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017

How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26% Errors : Disfluency Coverage

How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 Reference Prediction The report stated British government must The report stated that the Britain government help to stabilize weak states and push for must help stabilize the weak states and push international regulations that would stop international regulations to stop the use of freely 74.85% terrorists using freely available information to available information to create a form of new create and unleash new forms of biological biological warfare such as the modified version warfare such as a modified version of the 44.26% of the influenza . influenza virus . Errors : Disfluency Coverage

Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net

Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net thank-01 ARG1 Thank You you

Bonus Slides

Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - PowerPoint PPT Presentation

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation annis Konstas joint work with Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer AMR graph Generate from AMR graph text Decoder Encoder text Attention AMR

Neural AMR: Sequence-to- Sequence Models for Parsing and Generation Author: Ioannis Konstas,

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Sequence to Sequence Models for Machine Translation (2) CMSC 723 / LING 723 / INST 725 Marine

Recurrent Neural Models: Language Models, and Sequence Prediction and Generation CMSC 473/673

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

Sequence-to-Sequence Learning using Recurrent Neural Networks Jindich Helcl, Jindich

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

The Neural Noisy Channel: Generative Models for Sequence to Sequence Modeling Chris

Machine Translation/ Sequence-to-sequence Models Graham Neubig Site

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Where do the improvements come from in sequence-to-sequence neural TTS? Oliver Watts Gustav

Neural Probabilistic Models for Melody Prediction, Sequence Labelling and Classification

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

Neural Monkey Du san Vari s Institute of Formal and Applied Linguistics Faculty of

Machine Translation and Sequence-to-sequence Models http://phontron.com/class/mtandseq2seq2018/

Transformer Sequence Models and Sequence Applications (Machine Translation, Speech Recognition)

SEQ 3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive

A Quick Introduction to Machine Translation with Sequence-to-Sequence Models Kevin Duh Johns

Sequence-to-Sequence Learning as Beam-Search Optimization Sam Wiseman and Alexander M. Rush

Sequence to Sequence Models Matt Gormley Lecture 5 Sep. 11, 2019 1 Q&A Q: What did the

Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models Eldan Cohen

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation - PowerPoint PPT Presentation

Neural AMR : Sequence-to-Sequence Models for Parsing and Generation annis Konstas joint work with Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer AMR graph Generate from AMR graph text Decoder Encoder text Attention AMR

Neural AMR: Sequence-to- Sequence Models for Parsing and Generation Author: Ioannis Konstas,

Sequence to Sequence models: Connectionist Temporal Classification 1 Sequence-to-sequence

Sequence to Sequence Models for Machine Translation (2) CMSC 723 / LING 723 / INST 725 Marine

Recurrent Neural Models: Language Models, and Sequence Prediction and Generation CMSC 473/673

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence to Sequence models: Attention Models 1 Sequence-to-sequence modelling Problem:

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le,

Sequence to Sequence models: Connectionist Temporal Classification 5 March 2018 1

Sequence-to-Sequence Learning using Recurrent Neural Networks Jindich Helcl, Jindich

Introduction to sequence to sequence models N ATURAL LAN GUAGE GEN ERATION IN P YTH ON

The Neural Noisy Channel: Generative Models for Sequence to Sequence Modeling Chris

Machine Translation/ Sequence-to-sequence Models Graham Neubig Site

Sequence-to-sequence Models and Attention Graham Neubig Preliminaries: Language Models

Where do the improvements come from in sequence-to-sequence neural TTS? Oliver Watts Gustav

Neural Probabilistic Models for Melody Prediction, Sequence Labelling and Classification

Beyond Sequential decoding toward parallel decoding In the context of neural sequence modelling

Neural Monkey Du san Vari s Institute of Formal and Applied Linguistics Faculty of

Machine Translation and Sequence-to-sequence Models http://phontron.com/class/mtandseq2seq2018/

Transformer Sequence Models and Sequence Applications (Machine Translation, Speech Recognition)

SEQ 3 : Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive

A Quick Introduction to Machine Translation with Sequence-to-Sequence Models Kevin Duh Johns

Sequence-to-Sequence Learning as Beam-Search Optimization Sam Wiseman and Alexander M. Rush

Sequence to Sequence Models Matt Gormley Lecture 5 Sep. 11, 2019 1 Q&amp;A Q: What did the

Empirical Analysis of Beam Search Performance Degradation in Neural Sequence Models Eldan Cohen

Sequence to Sequence Models Matt Gormley Lecture 5 Sep. 11, 2019 1 Q&A Q: What did the