synchronous forest substitution grammars
play

Synchronous Forest Substitution Grammars Andreas Maletti Institute - PowerPoint PPT Presentation

Synchronous Forest Substitution Grammars Andreas Maletti Institute for Natural Language Processing University of Stuttgart, Germany maletti@ims.uni-stuttgart.de Porquerolles Island, France (CAI 2013) A. Maletti Synchronous Forest Substitution


  1. Synchronous Forest Substitution Grammars Andreas Maletti Institute for Natural Language Processing University of Stuttgart, Germany maletti@ims.uni-stuttgart.de Porquerolles Island, France (CAI 2013) A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  2. Outline Motivation Main model Results A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  3. Machine translation Translation ◮ Input: Official forecasts predicted just 3 percent, Bloomberg said. ◮ Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  4. Machine translation Translation ◮ Input: Official forecasts predicted just 3 percent, Bloomberg said. ◮ Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] ◮ Google Translate ( translate.google.com ): Offizielle Prognosen vorhergesagt nur 3 Prozent, sagte Bloomberg. [official] [forecasts] [*predicted] [only] [3 percent] [said] [Bloomberg] A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  5. Machine translation Translation ◮ Input: The ECB wants to hold inflation to under two percent, or somewhere in that vicinity. ◮ Reference: Die EZB ist bestrebt, die Inflationsrate unter zwei Prozent, [the] [ECB] [is] [desire] [the] [inflation rate] [below] [two percent] oder zumindest knapp an der Zwei-Prozent-Marke zu halten. [or] [at least] [close] [at] [the] [two percent mark] [to keep] A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  6. Machine translation Translation ◮ Input: The ECB wants to hold inflation to under two percent, or somewhere in that vicinity. ◮ Reference: Die EZB ist bestrebt, die Inflationsrate unter zwei Prozent, [the] [ECB] [is] [desire] [the] [inflation rate] [below] [two percent] oder zumindest knapp an der Zwei-Prozent-Marke zu halten. [or] [at least] [close] [at] [the] [two percent mark] [to keep] ◮ Google Translate ( translate.google.com ): Die EZB will die Inflation unter zwei Prozent zu halten , [the] [ECB] [wants] [the] [inflation] [below] [two percent] [*to keep] oder irgendwo in der Nähe. [or] [somewhere] [in] [the] [vicinity] A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  7. Syntax-based machine translation Architecture Language Machine Input − Parser translation model → Output → − → − → − system A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  8. Syntax-based machine translation Architecture Language Machine Input − Parser translation model → Output → − → − → − system Formalisms ◮ Parser = weighted tree automaton ◮ Translation system = some tree transducer ◮ Language model = weighted string automaton A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  9. Resources Input ◮ Parallel text (English and German) E URO P ARL ◮ Parsers B IT P AR , C HARNIAK , B ERKELEY A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  10. Resources Input ◮ Parallel text (English and German) E URO P ARL ◮ Parsers B IT P AR , C HARNIAK , B ERKELEY Example ◮ “We must bear in mind the Community as a whole.” ◮ “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  11. Resources Input ◮ Parallel text (English and German) E URO P ARL ◮ Parsers B IT P AR , C HARNIAK , B ERKELEY Example ◮ “We must bear in mind the Community as a whole.” ◮ “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” E URO P ARL German-English parallel data: 1 , 920 , 209 parallel sentences ◮ ◮ 44 , 548 , 491 words in German ◮ 47 , 818 , 827 words in English A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  12. First step: word alignment Alignments by G IZA ++ [O CH , N EY ’03]: We must bear in mind the Community as a whole Wir müssen uns davor hüten , alles vergemeinschaften zu wollen A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  13. First step: word alignment Alignments by G IZA ++ [O CH , N EY ’03]: We must bear in mind the Community as a whole Wir müssen uns davor hüten , alles vergemeinschaften zu wollen We can help countries catch up , but not by putting their neighbours on hold Wir können Ländern beim Aufholen helfen , aber nicht , indem wir ihre Nachbarn in den Wartesaal schicken A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  14. Second step: parsing C HARNIAK parser: TOP [C HARNIAK , J OHNSON ’05] S . NP VP . PRP MD VP We must VB PP NP bear IN NP NP PP in NN DT NN IN NP Community as mind the DT NN a whole BitPar parser: TOP [S CHMID ’06] S-TOP $. . NP-SB/Pl VMFIN-HD-Pl VP-OC/inf PPER-HD-Nom.Pl müssen NP-DA PP-OP/V VVINF-HD $, VP-OC/zu , Wir PPER-HD-Dat.Pl PROAV-PH hüten VP-OC/inf VZ-HD uns davor NP-OA VVINF-HD PTKZU-PM VMINF-HD PIS-HD-Acc.Sg.Neut vergemeinschaften zu wollen alles A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  15. Full example Parallel text Yugoslav President Voislav signed for Serbia. � ��� ����� � �� �� ��� ��� ���� ������ � ���� � � ��� ��� �� ���� �� �� �� � � � Transliteration: w twlY AltwqyE En SrbyA Alr}ys AlywgwslAfy fwyslAf. And then the matter was decided, and everything was put in place. � ���� ��� � � ���� � � �� � �� � ������� ��� � � �� � �� � �� � � � Transliteration: f kAn An tm AlHsm w wDEt Al > mwr fy nSAb hA. Below are the male and female winners in the different categories. ���� ���� �� � ���� � � ��� ��� �� � �� �� ���� � � � � ��� ��� � Transliteration: w hnA Al > wA}l w Al > wlyAt fy mxtlf Alf}At. A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  16. Full example Alignment Yugoslav President Voislav signed for Serbia w twlY AltwqyE En SrbyA Alr}ys AlywgwslAfy fwyslAf A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  17. Third step: rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  18. Third step: rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  19. Third step: rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  20. Third step: rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  21. Third step: rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  22. Third step: rule extraction S S S NP-SBJ VP q S CONJ VP — q NP q VP NML NNP VBD PP q VP q VP q NP w signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE En NN-PROP Alr}ys AlywgwslAfy fwyslAf DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

  23. Third step: rule extraction S S S NP-SBJ VP q S CONJ VP — q NP q VP NML NNP VBD PP q VP q VP q NP w signed JJ NNP Voislav IN NP NP-OBJ Yugoslav President for NNP VP Serbia q PP PV NP q PP q VP VBD — twlY DET-NN signed SrbyA AltwqyE AltwqyE En NN-PROP Alr}ys AlywgwslAfy fwyslAf DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S A. Maletti Synchronous Forest Substitution Grammars September 4, 2013

Recommend


More recommend