Improving Bilingual Sub-sentential Alignment by Sampling-based Transpotting Li Gong , Aur´ elien Max, Franc ¸ois Yvon LIMSI-CNRS & Universit´ e Paris-Sud Orsay, France
Method Experimental Results Conclusion and future work Context of this work Building SMT systems, step 1 : align parallel corpus s e m e s p r u o u t t s o ... f c o ... a r o a n n c t i i • parallel corpus can be huge ... une • we don’t use / need everything troupe de • we may regularly receive new comédiens data déguisés dans ... Our method for parallel corpus alignment • is very simple to describe and implement • processes each sentence pair independently • uses new data transparently ( plug-and-play ) 2 / 26
Method Experimental Results Conclusion and future work Context of this work Building SMT systems, step 1 : align parallel corpus s e m e s p r u o u t t s o ... f c o ... a r o a n n c t i i • parallel corpus can be huge ... une • we don’t use / need everything troupe de • we may regularly receive new comédiens data déguisés dans ... Our method for parallel corpus alignment • is very simple to describe and implement • processes each sentence pair independently • uses new data transparently ( plug-and-play ) 2 / 26
Method Experimental Results Conclusion and future work Outline 1 Method Sampling-based transpotting Sub-sentential alignment extraction 2 Experimental Results Basic alignment task Incremental alignment task 3 Conclusion and future work 3 / 26
Method Experimental Results Conclusion and future work Outline 1 Method Sampling-based transpotting Sub-sentential alignment extraction 2 Experimental Results Basic alignment task Incremental alignment task 3 Conclusion and future work 4 / 26
Method Experimental Results Conclusion and future work Outline 1 Method Sampling-based transpotting Sub-sentential alignment extraction 2 Experimental Results Basic alignment task Incremental alignment task 3 Conclusion and future work 5 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] , [1, 0, 1] please [1, 0, 0] . [1, 1, 1] un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] English French , [1, 0, 1] un caf´ 1 one coffee , please . e , s’il vous plaˆ ıt . please [1, 0, 0] 2 the coffee is not bad . ce caf´ e est correct . . [1, 1, 1] 3 yes , one tea . oui , un th´ e . un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] English French , [1, 0, 1] un caf´ 1 one coffee , please . e , s’il vous plaˆ ıt . please [1, 0, 0] 2 the coffee is not bad . ce caf´ e est correct . . [1, 1, 1] 3 yes , one tea . oui , un th´ e . un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] English French , [1, 0, 1] un caf´ 1 one coffee , please . e , s’il vous plaˆ ıt . please [1, 0, 0] 2 the coffee is not bad . ce caf´ e est correct . . [1, 1, 1] 3 yes , one tea . oui , un th´ e . un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] English French , [1, 0, 1] un caf´ 1 one coffee , please . e , s’il vous plaˆ ıt . please [1, 0, 0] 2 the coffee is not bad . ce caf´ e est correct . . [1, 1, 1] 3 yes , one tea . oui , un th´ e . un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] English French , [1, 0, 1] un caf´ 1 one coffee , please . e , s’il vous plaˆ ıt . please [1, 0, 0] 2 the coffee is not bad . ce caf´ e est correct . . [1, 1, 1] 3 yes , one tea . oui , un th´ e . un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] English French , [1, 0, 1] un caf´ 1 one coffee , please . e , s’il vous plaˆ ıt . please [1, 0, 0] 2 the coffee is not bad . ce caf´ e est correct . . [1, 1, 1] 3 yes , one tea . oui , un th´ e . un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Method Experimental Results Conclusion and future work Sampling-based transpotting 1 Given a source-target sentence pair, extract an association table : one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . 2 Draw a random sub-corpus from the parallel corpus and compute profiles for each word 3 Increment the count for each contiguous phrase pairs 4 Repeat steps 2 to 3 N times, so as to obtain an association table for the given sentence pair one diet coke , please . ↔ un coca z´ ero , s’il vous plaˆ ıt . one [1, 0, 1] diet [0, 0, 0] coke [0, 0, 0] English French , [1, 0, 1] un caf´ 1 one coffee , please . e , s’il vous plaˆ ıt . please [1, 0, 0] 2 the coffee is not bad . ce caf´ e est correct . . [1, 1, 1] 3 yes , one tea . oui , un th´ e . un [1, 0, 1] coca [0, 0, 0] . . . . . . 6 / 26
Recommend
More recommend