Generation in Machine Translation from Deep Syntactic Trees Keith Hall Petr N ě mec Johns Hopkins University Charles University in Prague
Outline ● Transfer-based MT ● Tectogrammatical Representation (TR) (deep syntax) ● Generation from English TR trees ● process ● models ● Empirical results SSST ‘07 - Hall & N ě mec
Transfer-based MT Source Target (Czech) (English) SSST ‘07 - Hall & N ě mec
Transfer-based MT Source Target (Czech) (English) SSST ‘07 - Hall & N ě mec
Transfer-based MT Interlingua Source Target (Czech) (English) SSST ‘07 - Hall & N ě mec
Transfer-based MT Interlingua Source Target (Czech) (English) SSST ‘07 - Hall & N ě mec
Transfer-based MT Tectogrammar Source Target (Czech) (English) SSST ‘07 - Hall & N ě mec
Tecto Transfer-based MT Czech English sentence sentence SSST ‘07 - Hall & N ě mec
Tecto Transfer-based MT deep deep syntax syntax (Czech Tecto) (English Tecto) surface surface syntax syntax Czech English sentence sentence SSST ‘07 - Hall & N ě mec
Tecto Transfer-based MT deep deep syntax syntax (Czech Tecto) (English Tecto) g n i s r a surface surface p syntax syntax Czech English sentence sentence SSST ‘07 - Hall & N ě mec
Tecto Transfer-based MT tree transduction deep deep syntax syntax (Czech Tecto) (English Tecto) g n i s r a surface surface p syntax syntax Czech English sentence sentence SSST ‘07 - Hall & N ě mec
Tecto Transfer-based MT tree transduction deep deep syntax syntax (Czech Tecto) (English Tecto) generation g n i s r a surface surface p syntax syntax Czech English sentence sentence SSST ‘07 - Hall & N ě mec
Tecto Transfer-based MT tree transduction deep deep syntax syntax (Czech Tecto) (English Tecto) generation g n i s r a surface surface ` p syntax syntax Czech English sentence sentence SSST ‘07 - Hall & N ě mec
Tecto Transfer-based MT tree transduction deep deep syntax syntax (Czech Tecto) (English Tecto) generation g n i s ? r a surface surface ` p syntax syntax Czech English sentence sentence SSST ‘07 - Hall & N ě mec
Transfer-based MT ● Allows us to explore deep syntactic representations ● Factored models are clear ● Need not be greedy one-best process ● although we present one-best generation/results SSST ‘07 - Hall & N ě mec
Tectogrammatical Representation “Now the network has opened a news bureau in the Hungarian capital” FORM: #2 LEMM: # FUNC: SENT FORM: opened LEMM: open FUNC: PRED POS: 'VBN' T_M: 'SIM'_'IND' FORM: network FORM: Now FORM: bureau FORM: capital LEMM: network LEMM: now LEMM: bureau LEMM: capital FUNC: ACT FUNC: TWHEN FUNC: PAT FUNC: LOC POS: 'NN' POS: 'RB' POS: 'NN' POS: 'NN' FORM: news FORM: Hungarian LEMM: news LEMM: hungarian FUNC: RSTR FUNC: RSTR POS: 'NN' POS: 'JJ' SSST ‘07 - Hall & N ě mec
Tectogrammatical Representation “Now the network has opened a news bureau in the Hungarian capital” FORM: #2 LEMM: # FUNC: SENT FORM: opened LEMM: open FUNC: PRED POS: 'VBN' T_M: 'SIM'_'IND' FORM: network FORM: Now FORM: bureau FORM: capital LEMM: network LEMM: now LEMM: bureau LEMM: capital FUNC: ACT FUNC: TWHEN FUNC: PAT FUNC: LOC POS: 'NN' POS: 'RB' POS: 'NN' POS: 'NN' FORM: news FORM: Hungarian LEMM: news LEMM: hungarian FUNC: RSTR FUNC: RSTR POS: 'NN' POS: 'JJ' SSST ‘07 - Hall & N ě mec
Tectogrammatical Representation “Now the network has opened a news bureau in the Hungarian capital” lemma FORM: #2 LEMM: # FUNC: SENT FORM: opened LEMM: open FUNC: PRED POS: 'VBN' T_M: 'SIM'_'IND' FORM: network FORM: Now FORM: bureau FORM: capital LEMM: network LEMM: now LEMM: bureau LEMM: capital FUNC: ACT FUNC: TWHEN FUNC: PAT FUNC: LOC POS: 'NN' POS: 'RB' POS: 'NN' POS: 'NN' FORM: news FORM: Hungarian LEMM: news LEMM: hungarian FUNC: RSTR FUNC: RSTR POS: 'NN' POS: 'JJ' SSST ‘07 - Hall & N ě mec
Tectogrammatical Representation “Now the network has opened a news bureau in the Hungarian capital” FORM: #2 LEMM: # FUNC: SENT functor FORM: opened LEMM: open FUNC: PRED POS: 'VBN' T_M: 'SIM'_'IND' FORM: network FORM: Now FORM: bureau FORM: capital LEMM: network LEMM: now LEMM: bureau LEMM: capital FUNC: ACT FUNC: TWHEN FUNC: PAT FUNC: LOC POS: 'NN' POS: 'RB' POS: 'NN' POS: 'NN' FORM: news FORM: Hungarian LEMM: news LEMM: hungarian FUNC: RSTR FUNC: RSTR POS: 'NN' POS: 'JJ' SSST ‘07 - Hall & N ě mec
Tectogrammatical Representation “Now the network has opened a news bureau in the Hungarian capital” FORM: #2 LEMM: # FUNC: SENT part-of-speech FORM: opened LEMM: open FUNC: PRED POS: 'VBN' T_M: 'SIM'_'IND' FORM: network FORM: Now FORM: bureau FORM: capital LEMM: network LEMM: now LEMM: bureau LEMM: capital FUNC: ACT FUNC: TWHEN FUNC: PAT FUNC: LOC POS: 'NN' POS: 'RB' POS: 'NN' POS: 'NN' FORM: news FORM: Hungarian LEMM: news LEMM: hungarian FUNC: RSTR FUNC: RSTR POS: 'NN' POS: 'JJ' SSST ‘07 - Hall & N ě mec
Tectogrammatical Representation “Now the network has opened a news bureau in the Hungarian capital” FORM: #2 LEMM: # FUNC: SENT FORM: opened LEMM: open FUNC: PRED tense & mood POS: 'VBN' T_M: 'SIM'_'IND' FORM: network FORM: Now FORM: bureau FORM: capital LEMM: network LEMM: now LEMM: bureau LEMM: capital FUNC: ACT FUNC: TWHEN FUNC: PAT FUNC: LOC POS: 'NN' POS: 'RB' POS: 'NN' POS: 'NN' FORM: news FORM: Hungarian LEMM: news LEMM: hungarian FUNC: RSTR FUNC: RSTR POS: 'NN' POS: 'JJ' SSST ‘07 - Hall & N ě mec
Generation Process deep 1. Insert syn-semantic (function) words syntax (English Tecto) 2. Subtree reordering ● Intermediary surface syntax ? surface ● Reordering constraints? syntax ● maximum subtree size ● coordination English sentence SSST ‘07 - Hall & N ě mec
Generation Model arg max A,f P ( A, f | T ) = arg max A,f P ( f | A, T ) P ( A | T ) ≈ arg max P ( f | T, arg max A P ( A | T )) f ● tecto nodes: T = { t 1 , . . . , t i , . . . , t n } ● insertion string: A = { a 1 , . . . , a i , . . . , a k } n ≤ k ≤ 2 n ● order mapping: f : { A ∪ T } → { 1 , . . . , 2 n } SSST ‘07 - Hall & N ě mec
Generation Model Insertion arg max A,f P ( A, f | T ) = arg max A,f P ( f | A, T ) P ( A | T ) ≈ arg max P ( f | T, arg max A P ( A | T )) f ● tecto nodes: T = { t 1 , . . . , t i , . . . , t n } ● insertion string: A = { a 1 , . . . , a i , . . . , a k } n ≤ k ≤ 2 n ● order mapping: f : { A ∪ T } → { 1 , . . . , 2 n } SSST ‘07 - Hall & N ě mec
Generation Model Reordering arg max A,f P ( A, f | T ) = arg max A,f P ( f | A, T ) P ( A | T ) ≈ arg max P ( f | T, arg max A P ( A | T )) f ● tecto nodes: T = { t 1 , . . . , t i , . . . , t n } ● insertion string: A = { a 1 , . . . , a i , . . . , a k } n ≤ k ≤ 2 n ● order mapping: f : { A ∪ T } → { 1 , . . . , 2 n } SSST ‘07 - Hall & N ě mec
Insertion Process “Now the network has opened a news bureau in the Hungarian capital” open PRED VBN SIM_IND network now bureau capital ACT TWHEN PAT LOC NN RB NN NN news hungarian RSTR RSTR NN JJ SSST ‘07 - Hall & N ě mec
Insertion Process “Now the network has opened a news bureau in the Hungarian capital” open PRED VBN SIM_IND network now bureau capital has ACT TWHEN PAT LOC AUX NN RB NN NN the DT news hungarian in RSTR PP RSTR NN JJ the a DT DT SSST ‘07 - Hall & N ě mec
Insertion Model P ( A | T ) � = P ( a i | a 1 , . . . , a i − 1 , T ) i � ≈ P ( a i | t i , t g ( i ) ) i ● Insertion is dependent on local context: ● tecto node (includes: lemma, functor, POS) ● parent node ● Three independent models: ● articles ● prepositions and subordinating conjunctions ● modals (deterministic, given functor) SSST ‘07 - Hall & N ě mec
Reordering Process “Now the network has opened a news bureau in the Hungarian capital” open PRED VBN SIM_IND network bureau capital now has ACT PAT LOC TWHEN AUX NN NN NN RB SSST ‘07 - Hall & N ě mec
Reordering Process “Now the network has opened a news bureau in the Hungarian capital” open PRED VBN SIM_IND network has bureau capital now ACT AUX PAT LOC TWHEN NN NN NN RB SSST ‘07 - Hall & N ě mec
Reordering Process “Now the network has opened a news bureau in the Hungarian capital” network open bureau capital has now ACT PRED PAT LOC AUX TWHEN NN VBN NN NN RB SIM_IND SSST ‘07 - Hall & N ě mec
Recommend
More recommend