syntax based machine translation using multi bottom up
play

Syntax-based Machine Translation using Multi Bottom-up Tree - PowerPoint PPT Presentation

Syntax-based Machine Translation using Multi Bottom-up Tree Transducers Andreas Maletti Fabienne Braune, Daniel Quernheim, Nina Seemann Institute for Natural Language Processing Universitt Stuttgart, Germany maletti@ims.uni-stuttgart.de


  1. Syntax-based Machine Translation using Multi Bottom-up Tree Transducers Andreas Maletti Fabienne Braune, Daniel Quernheim, Nina Seemann Institute for Natural Language Processing Universität Stuttgart, Germany maletti@ims.uni-stuttgart.de Uppsala — November 8, 2012 Syntax-based MT using MBOT A. Maletti 1 ·

  2. Overview Motivation 1 Extended Multi Bottom-up Tree Transducers 2 The Theory 3 The Application 4 Syntax-based MT using MBOT A. Maletti 2 ·

  3. Motivation Machine translation Translation Input: Official forecasts predicted just 3 percent, Bloomberg said. Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] Our MBOT translator (untuned): offiziellen prognosen vorausgesagt nur 3 % bloomberg habe. [official] [forecasts] [*predicted] [only] [3 %] [Bloomberg] [*has] Google Translate ( translate.google.com ): Offizielle Prognosen vorausgesagt nur 3 Prozent, sagte Bloomberg. [official] [forecasts] [*predicted] [only] [3 percent] [said] [Bloomberg] Syntax-based MT using MBOT A. Maletti 3 ·

  4. Motivation Machine translation Translation Input: Official forecasts predicted just 3 percent, Bloomberg said. Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] Our MBOT translator (untuned): offiziellen prognosen vorausgesagt nur 3 % bloomberg habe. [official] [forecasts] [*predicted] [only] [3 %] [Bloomberg] [*has] Google Translate ( translate.google.com ): Offizielle Prognosen vorausgesagt nur 3 Prozent, sagte Bloomberg. [official] [forecasts] [*predicted] [only] [3 percent] [said] [Bloomberg] Syntax-based MT using MBOT A. Maletti 3 ·

  5. Motivation Machine translation Translation Input: Official forecasts predicted just 3 percent, Bloomberg said. Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] Our MBOT translator (untuned): offiziellen prognosen vorausgesagt nur 3 % bloomberg habe. [official] [forecasts] [*predicted] [only] [3 %] [Bloomberg] [*has] Google Translate ( translate.google.com ): Offizielle Prognosen vorausgesagt nur 3 Prozent, sagte Bloomberg. [official] [forecasts] [*predicted] [only] [3 percent] [said] [Bloomberg] Syntax-based MT using MBOT A. Maletti 3 ·

  6. Motivation Machine translation Translation Input: The ECB wants to hold inflation to under two percent, or somewhere in that vicinity. Reference: Die EZB ist bestrebt, die Inflationsrate unter zwei Prozent, [the] [ECB] [is] [desire] [the] [inflation rate] [below] [two percent] oder zumindest knapp an der Zwei-Prozent-Marke zu halten. [or] [at least] [close] [at] [the] [two percent mark] [to keep] Google Translate ( translate.google.com ): Die EZB will die Inflation zu halten unter zwei Prozent, [the] [ECB] [wants] [the] [inflation] [*to keep] [below] [two percent] oder irgendwo in der Nähe. [or] [somewhere] [in] [the] [vicinity] Syntax-based MT using MBOT A. Maletti 4 ·

  7. Motivation Machine translation Translation Input: The ECB wants to hold inflation to under two percent, or somewhere in that vicinity. Reference: Die EZB ist bestrebt, die Inflationsrate unter zwei Prozent, [the] [ECB] [is] [desire] [the] [inflation rate] [below] [two percent] oder zumindest knapp an der Zwei-Prozent-Marke zu halten. [or] [at least] [close] [at] [the] [two percent mark] [to keep] Google Translate ( translate.google.com ): Die EZB will die Inflation zu halten unter zwei Prozent, [the] [ECB] [wants] [the] [inflation] [*to keep] [below] [two percent] oder irgendwo in der Nähe. [or] [somewhere] [in] [the] [vicinity] Syntax-based MT using MBOT A. Maletti 4 ·

  8. Motivation Syntax-based machine translation Remark There is no universally accepted definition Syntax-based systems Language Machine model Input − → Parser − → translation − → − → Output system Syntax-based MT using MBOT A. Maletti 5 ·

  9. Motivation What do we have? Input Parallel text (English and German) E URO P ARL Parsers B IT P AR , C HARNIAK , B ERKELEY Example “We must bear in mind the Community as a whole.” “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” Syntax-based MT using MBOT A. Maletti 6 ·

  10. Motivation What do we have? Input Parallel text (English and German) E URO P ARL Parsers B IT P AR , C HARNIAK , B ERKELEY Example “We must bear in mind the Community as a whole.” “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” Syntax-based MT using MBOT A. Maletti 6 ·

  11. Motivation What do we have? Input Parallel text (English and German) E URO P ARL Parsers B IT P AR , C HARNIAK , B ERKELEY Example “We must bear in mind the Community as a whole.” “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” E URO P ARL German-English parallel data: 1 , 920 , 209 parallel sentences 44 , 548 , 491 words in German 47 , 818 , 827 words in English Syntax-based MT using MBOT A. Maletti 6 ·

  12. Motivation First step: Word Alignment Alignments by G IZA ++ [O CH , N EY ’03]: We must bear in mind the Community as a whole . Wir müssen uns davor hüten , alles vergemeinschaften zu wollen . Syntax-based MT using MBOT A. Maletti 7 ·

  13. Motivation First step: Word Alignment Alignments by G IZA ++ [O CH , N EY ’03]: We must bear in mind the Community as a whole . Wir müssen uns davor hüten , alles vergemeinschaften zu wollen . We can help countries catch up , but not by putting their neighbours on hold . Wir können Ländern beim Aufholen helfen , aber nicht , indem wir ihre Nachbarn in den Wartesaal schicken . Syntax-based MT using MBOT A. Maletti 7 ·

  14. Motivation Second step: Parsing C HARNIAK parser: TOP [C HARNIAK , J OHNSON ’05] S . NP VP . PRP MD VP We must VB PP NP bear IN NP NP PP in NN DT NN IN NP Community as mind the DT NN a whole BitPar parser: TOP [S CHMID ’06] $. S-TOP . NP-SB/Pl VMFIN-HD-Pl VP-OC/inf PPER-HD-Nom.Pl müssen NP-DA PP-OP/V VVINF-HD $, VP-OC/zu , Wir PPER-HD-Dat.Pl PROAV-PH hüten VP-OC/inf VZ-HD uns davor NP-OA VVINF-HD PTKZU-PM VMINF-HD PIS-HD-Acc.Sg.Neut vergemeinschaften zu wollen alles Syntax-based MT using MBOT A. Maletti 8 ·

  15. Motivation Second step: Parsing C HARNIAK parser: TOP S , . S CC FRAG , . NP VP but RB PP PRP MD VP not IN S We can VB S by VP help NP VP VBG NP PP putting NNS VB PRT PRP$ NNS IN NP countries catch RP their neighbours on NN up hold BitPar parser: TOP CS-TOP $. ... . S-TOP $, , ... NP-SB/Pl VMFIN-HD-Pl VP-OC/inf PPER-HD-Nom.Pl können NP-DA PP-MO/V VVINF-HD APPRART-AC-Dat.Sg.Neut NN-HD-Dat.Sg.Neut Wir NN-HD-Dat.Pl.Neut helfen Ländern beim Aufholen Syntax-based MT using MBOT A. Maletti 9 ·

  16. Motivation Equalizing examples Input Yugoslav President Voislav signed for Serbia. � ��� ����� � �� �� ��� ��� ���� ������ � ���� � � ��� ��� �� ���� �� �� �� � � � Transliteration: w twlY AltwqyE En SrbyA Alr}ys AlywgwslAfy fwyslAf. And then the matter was decided, and everything was put in place. � ���� ��� � � ���� � � �� � �� � ������� ��� � � �� � �� � �� � � � Transliteration: f kAn An tm AlHsm w wDEt Al > mwr fy nSAb hA. Below are the male and female winners in the different categories. ���� ���� �� � ���� � � ��� ��� �� � �� �� ���� � � � � ��� ��� � Transliteration: w hnA Al > wA}l w Al > wlyAt fy mxtlf Alf}At. Syntax-based MT using MBOT A. Maletti 10 ·

  17. Motivation Equalizing examples Alignment Yugoslav President Voislav signed for Serbia w twlY AltwqyE En SrbyA Alr}ys AlywgwslAfy fwyslAf Syntax-based MT using MBOT A. Maletti 11 ·

  18. Motivation Rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S Syntax-based MT using MBOT A. Maletti 12 ·

  19. Motivation Rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S Syntax-based MT using MBOT A. Maletti 12 ·

  20. Motivation Rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S Syntax-based MT using MBOT A. Maletti 12 ·

Recommend


More recommend