Sequence to sequence model input output Encoder Decoder a I know inhabit Attention planet The knew inhabited man A planet was … … … … … know ARG0 I ARG1 ( planet ARG1-of inhabit Y w i | w <i , h ( s ) � � w = argmax ˆ p <s> w i I know the planet of
Linearization Graph —> Depth First Search hold ARG0 ARG1 time location person city date-entity meet ARG0-of ARG0 year month name have-role person ARG1 ARG2 2002 1 “New York” ARG1-of ARG2-of official country name expert group “United States” US officials held an expert group meeting in January 2002 in New York .
Linearization Graph —> Depth First Search hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 United_States date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity 2002 1) expert group :location New_York “United States” US officials held an expert group meeting in January 2002 in New York .
Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding Token embeddings - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - [ ] [ ] [ ] [ ] [ ] h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York
Encoding Linearize —> RNN encoding Token embeddings - Recurrent Neural Network (RNN) - Bi-directional RNN - [ ] [ ] [ ] [ ] [ ] h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) hold :ARG0 (person :ARG0-of (have-role :ARG1 United_States h 1(s) h 2(s) h 3(s) h 4(s) h 5(s) :ARG2 official) ) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group) ) hold ARG0 ( person ARG0-of :time (date-entity 2002 1) :location New_York
Decoding RNN Encoding —> RNN Decoding (Beam search) h 1 h N(s)
Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) - h 1 h N(s) ∅
Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) Holding - Held softmax - US … h 1 h N(s) ∅
Decoding RNN Encoding —> RNN Decoding (Beam search) init h ( s ) Holding a - Held the softmax - US meeting w i | w <i , h ( s ) � � … … - p h 1 h 2 h N(s) ∅ w 11: Holding Helds w 12: Hold w 13: US w 14: …
Decoding RNN Encoding —> RNN Decoding (Beam search) US init h ( s ) Holding a - person Held the softmax expert - US meeting … w i | w <i , h ( s ) � � … … - p … h 1 h 2 h 3 h N(s) ∅ w 11: Holding Hold a w 21: Helds w 12: w 22: Hold the Hold w 13: Held a w 23: US w 14: Held the w 24: … …
Decoding RNN Encoding —> RNN Decoding (Beam search) US init h ( s ) Holding a meeting - person Held the meetings softmax expert - US meeting meet … w i | w <i , h ( s ) � � … … … - p … h 1 h 2 h 3 h k h N(s) ∅ officials held w k1: The US w 11: Holding Hold a w 21: officials held a US w k2: Helds w 12: w 22: Hold the Hold w 13: Held a w 23: US officials hold the w k3: US w 14: Held the w 24: officials will hold a w k4: US … … …
Attention a the meeting … h 2 h 3 w 2 : held
Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) h 5(s) ARG0-of
Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) h 5(s) ARG0-of h ( s ) , h i � � �� a i = soft max f i a ij h ( s ) X c i = j i
Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002
Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002
Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002
Attention w 2 : held a the meeting [ ] [ ] [ ] [ ] [ ] … h 3 h 1(s) hold h 2(s) ARG0 [ ] c 3 h 3(s) ( person h 4(s) hold ARG0 ( person role US official ) ARG1 ( meet expert group ) US h 5(s) ARG0-of officials held an expert h ( s ) , h i � � �� a i = soft max f i group meeting a ij h ( s ) X c i = in j January i 2002
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York .
Pre-processing Linearization —> Anonymization hold hold ARG0 ARG1 time location :ARG0 (person :ARG0-of (have-role person city city :ARG1 loc_0 date-entity meet :ARG2 official) ARG0-of ) ARG0 year month name :ARG1 (meet have-role :ARG0 (person person ARG1 ARG2 :ARG1-of expert 2002 2002 1 1 “New York” “New York” ARG1-of ARG2-of :ARG2-of group) country official country ) name :time (date-entity year_0 month_0) expert group :location loc_1 “United “United States” States” US officials held an expert group meeting in January 2002 in New York . loc_0 officials held an expert group meeting in month_0 year_0 in loc_1 .
Experimental Setup AMR LDC2015E86 (SemEval-2016 Task 8) ‣ Hand annotated MR graphs: newswire, forums ‣ ~16k training / 1k development / 1k test pairs Train ‣ Optimize cross-entropy loss Evaluation ‣ BLEU n-gram precision (Papineni et al., ACL 2002)
First Attempt TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
First Attempt TreeToStr TSP PBMT NNLG 29 26.9 23.2 23 22.4 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
First Attempt TreeToStr TSP PBMT NNLG 29 26.9 23.2 23 22.4 22 17.4 BLEU 11.6 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
First Attempt TreeToStr TSP PBMT NNLG 29 All systems use a 26.9 23.2 Language Model 23 22.4 22 trained on a very 17.4 large corpus. BLEU 11.6 We will emulate via data augmentation . 5.8 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 (Sennrich et al., ACL 2016) PBMT : Pourdamaghani and Knight, INLG 2016
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 74.85% 44.26%
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition 74.85% 44.26%
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 ‣ Repetition ‣ Coverage 74.85% 44.26%
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens 74.85% 9000 44.26% 4500 0
What went wrong? hold Reference :ARG0 (person :ARG0-of (have-role US officials held an expert group meeting in January :ARG1 loc_0 2002 in New York . :ARG2 official) ) :ARG1 (meet Prediction :ARG0 (person :ARG1-of expert :ARG2-of group) United States officials held held a meeting in ) January 2002 . :time (date-entity year_0 month_0) :location loc_1 Total OOV@1 OOV@5 ‣ Repetition ‣ Coverage 18000 a) Sparsity 13500 Tokens b) Avg sent length: 20 words 74.85% 9000 c) Limited Language 44.26% Modeling capacity 4500 0
Data Augmentation Original Dataset: ~16k graph-sentence pairs
Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only*
Data Augmentation Original Dataset: ~16k graph-sentence pairs Gigaword: ~183M sentences *only* Sample sentences with vocabulary overlap Original Giga-200k Giga-2M 80 60 % 40 20 0 OOV@1 OOV@5
Data Augmentation graph Generate from MR graph text Encoder Decoder text Attention
Data Augmentation graph Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention
Data Augmentation graph Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention
Data Augmentation graph Re-train Parse to MR Generate from MR text graph graph text Encoder Decoder Encoder Decoder text Attention Attention
Data Augmentation input Generate Parse to Input from Input text
Paired Training
Paired Training ( , ) Train MR Parser P on Original Dataset
Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N
Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword
Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P
Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i
Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i
Paired Training ( , ) Train MR Parser P on Original Dataset for i = 0 … N Self-train Parser S i =Sample k 10 i sentences from Gigaword Parse S i sentences with P Re-train MR Parser P on S i ( , ) Train Generator G on S N
Training MR Parser Train P on Original Dataset
Training MR Parser Train P on Original Dataset
Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword 200k
Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P 200k 200k
Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword ( , ) Parse S 1 with P Train P on S 1 = 200k 200k 200k 200k
Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 1 = 200k Train P on sentences Original Dataset from Gigaword Fine-tune P on ( , ) Parse S 1 with P Original Dataset Train P on S 1 = 200k 200k 200k 200k
Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 200k 200k 200k
Fine-tune : init parameters from previous step and train on Original Dataset Training MR Parser Sample S 2 = 2M sentences from Gigaword Fine-tune P on ( , ) Parse S 2 with P Original Dataset Train P on S 2 = 2M 2M 2M 200k 2M
Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 200k 2M
Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 2M G G 2M
Fine-tune : init parameters from previous step and train on Original Dataset Training MR Generator Sample S 3 = 2M sentences from Gigaword Fine-tune G on ( , ) Parse S 3 with P Original Dataset Train G on S 3 = 2M 2M 2M 2M G G 2M
Final Results TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Final Results TreeToStr TSP PBMT NNLG NNLG-200k NNLG-2M NNLG-20M 35 28 26.9 23 21 22.4 22 BLEU 14 7 0 TreeToStr : Flanigan et al, NAACL 2016 TSP : Song et al, EMNLP 2016 PBMT : Pourdamaghani and Knight, INLG 2016
Recommend
More recommend