Hybrid Example-Based Rule-Based MT: Feeding Apertium with Bilingual - PowerPoint PPT Presentation

Hybrid Example-Based – Rule-Based MT: Feeding Apertium with Bilingual Chunks Felipe S´ anchez-Mart´ ınez Dept. Llenguatges i Sistemes Inform` atics Universitat d’Alacant E-03071 Alacant, Spain fsanchez@dlsi.ua.es Work done in collaboration with Andy Way (DCU) and Mikel L. Forcada (UA) at the Centre for Next Generation Localisation – DCU 8th July 2009 Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 1 / 27

Outline 1 Motivation & goal 2 The Apertium free/open-source MT platform Apertium rule-based MT engine Apertium: example of translation 3 Integration of bilingual chunks into Apertium Considerations Translation approach Computation of the best coverage 4 Experiments Experimental setup Results: marker-based chunks Results: tree-based chunks 5 Discussion Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 2 / 27

Motivation & goal Motivation & goal Motivation: Usually rule-based machine translation (RBMT) systems do not benefit from the post-edition effort of professional translators Some RBMT may benefit from the translation units found in translation memories (usually whole sentences) Goal: To integrate sub-sentential translation units into the Apertium free/open-source MT platform Test the approach with bilingual chunks automatically obtained using the example-based methods implemented in Matrex Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 3 / 27

The Apertium free/open-source MT platform Apertium rule-based MT engine Apertium rule-based MT engine source text → de-formatter ↓ morph. analyser ↓ PoS tagger ↓ structural transfer ↔ lexical transfer ↓ morph. generator ↓ Post-generator ↓ Re-formatter → target text Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 4 / 27

The Apertium free/open-source MT platform Apertium: example of translation Apertium: Example of execution /1 Source text: Francis’ car is broken De-formatter: Francis’[ ]car[ ]is broken Morphological analyser: ˆ Francis’ / Francis <np><ant><m><sg>+ ’s <gen>$ [ ] ˆ car / car <n><sg>$ [ ] ˆ is / be <vbser><pri><p3><sg>$ ˆ broken / break <vblex><pp>$ Part-of-speech tagger: ˆ Francis <np><ant><m><sg>$ ˆ ’s <gen>$ [ ] ˆ car <n><sg>$ [ ] ˆ be <vbser><pri><p3><sg>$ ˆ break <vblex><pp>$ Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 5 / 27

The Apertium free/open-source MT platform Apertium: example of translation Apertium: Example of execution /2 Structural transfer (prechunk) + Lexical transfer: ˆ nom <SN><UNDET><m><sg> { ˆ Francis <np><ant><3><4>$ } $ ˆ pr <GEN> {} $ [ ] ˆ nom <SN><UNDET><m><sg> { ˆ coche <n><3><4>$ } $ [ ] ˆ be pp <Vcop><vblex><pri><p3><sg><GD> { ˆ estar <vblex><3><4><5>$ ˆ romper <vblex><pp><6><5>$ } $ Structural transfer (interchunk): [ ] ˆ nom <SN><PDET><m><sg> { ˆ coche <n><3><4>$ } $ [ ] ˆ pr <PREP> { ˆ de <pr>$ } $ ˆ nom <SN><PDET><m><sg> { ˆ Francis <np><ant><3><4>$ } $ ˆ be pp <Vcop><vblex><pri><p3><sg><m> { ˆ estar <vblex><3><4><5>$ ˆ romper <vblex><pp><6><5>$ } $ Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 6 / 27

The Apertium free/open-source MT platform Apertium: example of translation Apertium: Example of execution /3 Structural transfer (postchunk): [ ] ˆ el <det><def><m><sg>$ ˆ coche <n><m><sg>$ [ ] ˆ de <pr>$ ˆ Francis <np><ant><m><sg>$ ˆ estar <vblex><pri><p3><sg>$ ˆ romper <vblex><pp><m><sg>$ Morphological generator and post-generator: [ ]el coche[ ]de Francis est´ a roto De-formatter: el coche de Francis est´ a roto Target text: el coche de Francis est´ a roto Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 7 / 27

Integration of bilingual chunks into Apertium Considerations Considerations To take into account: Not break the application of structural transfer rules Use the longest possible chunks How can the application of rules be preserved? Introducing chunks delimiters as format information . . . is [BCH 12 0]the chunk detected[ECH 12 0] by . . . Chunks can be then recognised after the translation . . . es [BCH 12 0]el segmento detectado[ECH 12 0] por . . . Known problem: As a result of the structural transfer rules, format information may be moved around Some rules also delete format information (known bug) Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 8 / 27

Integration of bilingual chunks into Apertium Translation approach Translation approach apply a dynamic-programming algorithm to compute the best 1 coverage of the input sentence translate the input sentence as usual by Apertium 2 use a language model to choose one of the possible translations 3 for each of the bilingual chunks detected One source-language chunk may have different target-language translations Also consider Apertium translation Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 9 / 27

Integration of bilingual chunks into Apertium Computation of the best coverage Computation of the best coverage: data structure Store source-language chunks in a trie of strings adjourned session ... 1 2 the interest ... shown with ... 3 4 the interest ... ... 5 It allows to compute the best coverage in O ( l ) time, where l is the length of the input sentence Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 10 / 27

Integration of bilingual chunks into Apertium Computation of the best coverage Computation of the best coverage: algorithm ... ... like in the session adjourned with the interest of A set of alive states in the trie is maintained to compute all the possible ways to cover the input sentence A new search is started at every word At each position the best coverage until that position is stored Is applied to text segments shorter than sentences The best coverage can be retrieved when there are no more alive states Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 11 / 27

Integration of bilingual chunks into Apertium Computation of the best coverage Computation of the best coverage The best coverage: is the one that uses the least possible number of chunks longest possible chunks each not covered word counts like one chunk if two coverages use the same number of chunks, the one that uses the most frequent chunks is used Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 12 / 27

Experiments Experimental setup Experimental setup /1 Data used: Corpora distributed for the WMT 09 Workshop for MT Language pairs: Spanish–English ( es-en ), English–Spanish ( en-es ) Linguistic data: apertium-en-es ; SVN revision 9284 Software used: Apertium Giza++ and Moses to calculate word alignments and lexical probabilities SRILM to train 5-gram language models Matrex to segment training corpora and to align chunks Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 13 / 27

Experiments Experimental setup Experimental setup /2 Training corpus: Max. sentence length: 45 words Max. word ratio: 1.5 words (mean ration + std. dev.) # sent: 1,187,905; # en words: 26,983,025; # es words: 27,951,388 Development corpus: # sent: 2,050; # en words: 49,884; # es words: 52,719 Test corpus: # sent: 3,027; # en words: 77,438; # es words: 80,580 Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 14 / 27

Experiments Experimental setup Experimental setup /3 Methods used to extract bilingual chunks: Marker-based bilingual chunks (using Matrex) Parse-tree based bilingual chunks (thanks to John Tinsley) Preliminary results using previously compute chunks using an old version of the Europarl parallel corpus Felipe S´ anchez-Mart´ ınez (Univ. d’Alacant) Hybrid MT: Feeding Apertium with chunks 8th July 2009 15 / 27

Hybrid Example-Based Rule-Based MT: Feeding Apertium with Bilingual - PowerPoint PPT Presentation

Hybrid Example-Based Rule-Based MT: Feeding Apertium with Bilingual Chunks Felipe S anchez-Mart nez Dept. Llenguatges i Sistemes Inform` atics Universitat dAlacant E-03071 Alacant, Spain fsanchez@dlsi.ua.es Work done in

Hybrid Rule-Based Example-Based MT: Feeding Apertium with Sub-sentential Translation Units

What are you feeding on? Daniel 1 What are you feeding on? What are you feeding on? What are

aims Silage Feeding pigs silage Soyabean meal Feeding pigs silage

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Swine Day 2004 and Feeding Gestating Sows Feeding sows in gestation based on body weight and

Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based Activity Using Rule-Based

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

The TMR Feeding Program Dr. Jim Linn University of Minnesota St. Paul, Minnesota Keys to a

Feeding System - centralized feeding system that is easy to install and simple to use Easy to

Pediatric Feeding Pediatric Feeding Difficulties Difficulties Erin Erin Reier Reier, OTD,

Peeking through the language barrier: the development of a free/open-source gisting system for

Cue Based Feeding in the NICU ANNA ELSENBROCK, MS, OTR/L, CPST, CNT LAURA LUCAS, MS, RD, CSP, LD

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

Sourcerer: An Infrastructure for Large-scale Collection and Analysis of Open-source Code Sushil

Bom Bombardier Con Contribution on t to t o the rd AIAA Hi 3 rd AIAA High gh-Lif Lift t

Catholic District School Board of f Eastern Ontario 2016-2017 MISA Collaborative Inquiry How do

Midiendo la red Joo Damas Geoff Huston APNIC LACNIC 27, Foz de Iguau, Mayo 2017 Agenda

Motivation. 1 Three Basic Paradigms to Cryptographic E-voting The Mix The Mix- -net

Usable Verifiable Remote Electronic Voting case study HELIOS 18.07.2012 SecVote Dagstuhl

GC0074: GCRP Membership Place your chosen image here. The four corners must just cover the

Formal Analysis of Electronic Voting Systems Mark Ryan University of Birmingham joint work with