Kyoto-U: Syntactical EBMT System for NTCIR 7 Patent System for NTCIR-7 Patent Translation Task Translation Task Kyoto University Toshiaki Nakazawa Toshiaki Nakazawa Sadao Kurohashi Sadao Kurohashi
Overview of Kyoto-U System Overview of Kyoto U System Translation Examples J: 図書館で新聞を読む E: I read a newspaper in the library E: I read a newspaper in the library J: 政治の本が売れ残っている E: A book in politics was left on the shelf E: A book in politics was left on the shelf ・・・・・
Overview of Kyoto-U System Overview of Kyoto U System Translation Examples I 図書館 で read library in 新聞 を 新聞 を a newspaper newspaper ACC 読む in the library read a book a book 政治 の 政治 の in politics in politics 本 が book NOM was left 売れ残って いる 売れ残って いる left unsold on the shelf ・・・・・ ・・・・・
Overview of Kyoto-U System Overview of Kyoto U System Translation Examples Input: 図書館で政治の 書館 政治 I 本を読む。 図書館 で read 新聞 を 新聞 を a newspaper I 読む 図書館 で in the library read library library in in 政治 の a book in politics 本 を in politics book ACC a book a book 読む 読む 政治 の 政治 の in the library read in politics 本 が was left 売れ残って いる 売れ残って いる on the shelf Output: I read a book in politics ・・・・・ ・・・・・ in the library
Alignment Alignment
Alignment Alignment J: 交差点で 突然あの車が J: 交差点で、突然あの車が E The car came at me from E : The car came at me from the side at the intersection. 飛び出して来たのです。
Alignment Alignment 交差 the car 点 で 、 came 突然 突然 at me あの from the side 車 が at the intersection t th i t ti 飛び出して 来た のです 1. Transformation into dependency structure J: JUMAN/KNP E: Charniak’s nlparser → Dependency tree E: Charniak s nlparser → Dependency tree
Alignment Alignment 交差 the car 点 で 、 came 突然 突然 at me あの from the side 車 が at the intersection t th i t ti 飛び出して 来た のです 1. Transformation into dependency structure 2. Detection of word(s) correspondences
Finding Correspondences Finding Correspondences • Bilingual dictionaries (500K entries) g ( ) • Substring co-occurrence (Cromieres 2006) count ( ( j , e ) ) > θ count ( ( j j ) ) ⋅ count ( ( e ) ) • Numeral normalization 二百十六万 → 2,160,000 ← 2.16 million 2 160 000 2 16 million 二百十六万 • Transliteration (Katakana words, NEs) ローズワイン → rosuwain ⇔ rose wine (similarity:0.78) 新宿 → shinjuku ⇔ shinjuku (similarity:1.0)
Alignment Alignment 交差 the car 点 で 、 came 突然 突然 at me あの from the side 車 が at the intersection t th i t ti 飛び出して 来た のです 1. Transformation into dependency structure 2. Detection of word(s) correspondences 3. Disambiguation of correspondences
Alignment Alignment 交差 the car 点 で 、 came 突然 突然 at me あの from the side 車 が at the intersection t th i t ti 飛び出して 来た のです 1. Transformation into dependency structure 2. Detection of word(s) correspondences 3. Disambiguation of correspondences 4. Handling of remaining phrases Extension to leaf-nodes
Alignment Alignment 交差 the car 点 で 、 came 突然 突然 at me あの from the side 車 が at the intersection t th i t ti 飛び出して 来た のです 1. Transformation into dependency structure 2. Detection of word(s) correspondences 3. Disambiguation of correspondences 4. Handling of remaining phrases 5. Registration to translation example database
Alignment Ambiguities Alignment Ambiguities you 日本 で 日本 で [ in Japan ] will have to file will have to file 保険 保険 [ insurance ] insurance 会社 に 対して [ to the company ] an claim 保険 [ insurance ] [ insurance ] insurance 請求 の [ of claim ] [ ] with the office 申し立て が [ file ] in Japan 可能です よ [ be able to ]
Alignment: Consistency Alignment: Consistency Near Far Far
∑ ∑ ∑ ∑ ( ( ) ) n n n n cs d d ( ( a , a ) ), d d ( ( a , a ) ) J i j E i j i = 1 j = i + 1 arg max n n ( n n − ( 1 1 ) ) / / 2 2 alignment alignment • For each pair of candidates a i and a j For each pair of candidates a i and a j calculate the J-side distance d J and the E-side distance d E • Give a consistency score to the pair based • Give a consistency score to the pair based on d J and d E • Calculate consistency scores for all the pairs in a possible set of alignment candidates p g
Baseline Baseline Distance of Each Branch: 1 Distance of Each Branch: 1 1 1 1 1 ( ( ) ) Consistency Score: cs d , d = + J E d d J E … … … 1/1+1/2=1 5 1/1+1/2=1.5
Consistency Score Consistency Score • The frequency of distance pair in gold-standard alignment data (Mainichi newspaper 40K li t d t (M i i hi 40K sentence pairs) [Uchimoto04] Frequency (log) (log) Dist of J-Side Dist of E-Side
Distance based on Dependency Type Distance based on Dependency Type 3 3 you y 日本 で NP デ格 [ in Japan ] 1 will have to file 保険 文節内 [ i [ insurance ] ] 3 1 insurance 会社 に 対して NN 連用 [ to the company ] 3 3 1 an claim 保険 NP 文節内 [ insurance ] 1 2 2 insurance 請求 の NN ノ格 [ of claim ] 3 3 with the office with the office 申し立て が 申し立て が PP PP ガ格 ガ格 [ file ] 3 in Japan p 可能です よ 可能です よ PP [ be able to ]
Distance based on Dependency Type Distance based on Dependency Type 3 3 you y 日本 で NP デ格 [ in Japan ] 1 will have to file 保険 文節内 [ i [ insurance ] ] 3 1 insurance 会社 に 対して NN 連用 [ to the company ] 3 3 1 an claim 保険 NP 文節内 [ insurance ] 1 2 2 insurance 請求 の NN ノ格 [ of claim ] 3 3 with the office with the office 申し立て が 申し立て が PP PP ガ格 ガ格 [ file ] 3 in Japan p 可能です よ 可能です よ PP [ be able to ]
Distance based on Dependency Type Distance based on Dependency Type 3 3 y you 日本 で NP デ格 [ in Japan ] 1 will have to file 保険 文節内 [ i [ insurance ] ] 3 1 insurance 会社 に 対して NN 連用 [ to the company ] 3 3 1 an claim 保険 NP 文節内 [ insurance ] 1 2 2 insurance 請求 の NN ノ格 [ of claim ] 3 3 with the office with the office 申し立て が 申し立て が PP PP ガ格 ガ格 [ file ] 3 in Japan p 可能です よ 可能です よ PP [ be able to ]
Example of Alignment I Improvement t Proposed model Proposed model Word base alignment Word-base alignment
Translation Translation
Translation Translation Translation Examples Input: 図書館で政治の 書館 政治 I 本を読む。 図書館 で read 新聞 を 新聞 を a newspaper I 読む 図書館 で in the library read library library in in 政治 の a book in politics 本 を in politics book ACC a book a book 読む 読む 政治 の 政治 の in the library read in politics 本 が was left 売れ残って いる 売れ残って いる on the shelf Output: I read a book in politics ・・・・・ ・・・・・ in the library
Selection of Translation Examples Selection of Translation Examples • Score for an example 1. Size of an example [Sato 91] 2 2. Similarity of neighboring nodes Si il it f i hb i d 3 3. Translation probability Translation probability • Beam search from the root of the input Beam search from the root of the input
I read a ne spaper a newspaper Translation in the library Input: example: I 図書館 で 図書館 で library in read 政治 の 新聞 を in p politics a newspaper 本 を 読む book ACC 読む in the library read I 0.7 study 2 2 a newspaper × + × + w × w 2 w 0 . 7 trans size sim 3 in the library
Combination of TMs Combination of TMs Translation Examples Input: 図書館で政治の 書館 政治 I 本を読む。 図書館 で read 新聞 を 新聞 を a newspaper I 読む 図書館 で in the library read library library in in 政治 の a book in politics 本 を in politics book ACC a book a book 読む 読む 政治 の 政治 の in the library read in politics 本 が was left 売れ残って いる 売れ残って いる on the shelf ・・・・・ ・・・・・
Recommend
More recommend