Linearization Graph —> Depth First Search (Human-authored annotation) hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official official country ) name :time (date-entity 2002 1) expert group :location New_York “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Linearization Graph —> Depth First Search (Human-authored annotation) hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official official country ) name :time (date-entity 2002 1) expert group :location New_York “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York . loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .
Experimental Setup AMR LDC2015E86 (SemEval-2016 Task 8) ‣ Hand annotated MR graphs: newswire, forums ‣ ~16k training / 1k development / 1k test pairs Train ‣ Optimize cross-entropy loss Evaluation ‣ BLEU n-gram precision ( Generation ) (Papineni et al., 2002) ‣ SMATCH score ( Parsing ) (Cai and Knight, 2013)
Experiments ‣ Vanilla experiment ‣ Limited Language Model Capacity ‣ Paired Training ‣ Data augmentation algorithm
First Attempt ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 26.9 23.2 23 22.4 22 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
First Attempt ( Generation ) TreeToStr TSP PBMT NeuralAMR 29 All systems use a 26.9 23.2 Language Model 23 22.4 22 trained on a very 17.4 large corpus. BLEU 11.6 We will emulate via data augmentation . 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 (Sennrich et al., ACL 2016) PBMT : Pourdamaghani and Knight, INLG 2016
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26%
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition 74.85% 44.26%
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition ‣ Coverage 74.85% 44.26%
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens 74.85% 9000 44.26% 4500 0
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens b) Avg sent length: 20 words 74.85% 9000 c) Limited Language 44.26% Modeling capacity 4500 0
Data Augmentation Original Dataset: ~16k graph-sentence pairs
Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only*
Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap Original Giga-200k Giga-2M Giga-20M 80 60 % 40 20 0 OOV@1 OOV@5
Data Augmentation graph Generate from AMR graph text Encoder Decoder text Attention
Data Augmentation graph Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention
Data Augmentation graph Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention
Data Augmentation graph Re-train Parse to AMR Generate from AMR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention
Semi-supervised Learning ‣ Self-training ‣ McClosky et al. 2006 ‣ Co-training ‣ Yarowski 1995, Blum and Mitchell 1998, Sarkar 2001 ‣ Sogaard and Rishoj, 2010
Paired Training
Paired Training ( , ) Train AMR Parser P on Original Dataset
Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N
Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword
Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P
Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i
Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i
Paired Training ( , ) Train AMR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train AMR Parser P on S i ( , ) Train Generator G on S N
Training AMR Parser Train P on Original Dataset
Training AMR Parser Train P on Original Dataset
Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword 200k
Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P 200k 200k
Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P Train P on S 1 = 200k 200k 200k 200k
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword Fine-tune P on ( , ) Parse S 1 with P Original Dataset Train P on S 1 = 200k 200k 200k 200k
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 200k 200k 200k
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 2M 2M 200k 2M
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 2M 2M
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 20M 20M 20M
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Parser Sample S 3 = 20M sentences from Gigaword Fine-tune P on ( , ) Parse S 3 with P Original Dataset Train P on S 3 = 20M 2M 20M 20M 20M
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 2M 20M 20M 20M
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 20M 20M 20M G G 20M
Fine-tune : init parameters from previous step and train on Original Dataset Training AMR Generator Sample S 4 = 20M sentences from Gigaword Fine-tune G on ( , ) Parse S 4 with P Original Dataset Train G on S 4 = 20M 20M 20M 20M G G 20M
Final Results ( Generation ) TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results ( Generation ) TreeToStr TSP PBMT NeuralAMR NeuralAMR-200k NeuralAMR-2M NeuralAMR-20M 35 33.8 32.3 28 27.4 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results ( Parsing ) SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017
Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017
Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017
Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017
Final Results ( Parsing ) SBMT CharLSTM+CAMR Seq2Seq NeuralAMR-20M 70 67.3 67.1 62.1 56 52 42 SMATCH 28 14 0 SBMT : Pust et al, 2015 CharLSTM+CAMR : Noord and Bos, 2017 Seq2Seq : Peng et al., 2017
How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26% Errors : Disfluency Coverage
How did we do? ( Generation ) hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in :ARG1 loc_0 January 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) In January 2002 United States officials held a ) meeting of the group experts in New York . :time (date-entity year_0 month_0) :location loc_1 Reference Prediction The report stated British government must The report stated that the Britain government help to stabilize weak states and push for must help stabilize the weak states and push international regulations that would stop international regulations to stop the use of freely 74.85% terrorists using freely available information to available information to create a form of new create and unleash new forms of biological biological warfare such as the modified version warfare such as a modified version of the 44.26% of the influenza . influenza virus . Errors : Disfluency Coverage
Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net
Summary ‣ Sequence-to-sequence models for Parsing and Generation ‣ Paired Training : scalable data augmentation algorithm ‣ Achieve state-of-the-art performance on generating from AMR ‣ Best-performing Neural AMR Parser ‣ Demo, Code and Pre-trained Models: http://ikonstas.net thank-01 ARG1 Thank You you
Bonus Slides
Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding Token embeddings - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York
Recommend
More recommend