experiments in term translation
play

Experiments in Term Translation Mihael Arcan DERI, NUI Galway - PowerPoint PPT Presentation

Experiments in Term Translation Mihael Arcan DERI, NUI Galway Supervised by Dr. Paul Buitelaar Monnet is supported by the European Union under Grant No. 248458 Motivation Generation of multilingual ontologies most of the


  1. Experiments in Term Translation Mihael Arcan DERI, NUI Galway Supervised by Dr. Paul Buitelaar Monnet is supported by the European Union under Grant No. 248458

  2. Motivation • Generation of ‘multilingual’ ontologies – most of the ontologies are in English language – terms need to be translated

  3. Overview • Monnet Project • Research – building domain-specific resources • architecture • domain-specific resources • results and evaluation • main findings – term disambiguation • building a contextual-semantic resource – general parallel resource – ontology – future work

  4. Monnet Business Information in EN, DE, NL, ES etc. http://www.monnet-project.eu

  5. Research Objectives & Use Cases Research Objectives • Development and use of ‘multilingual ontologies’ – ontologies with rich multilingual descriptors • Exploit ‘domain semantics’ to improve Machine Translation – use of ontological, terminological, linguistic knowledge Use Cases • Financial Use Case – Cross-lingual Business Intelligence • Public Services Use Case – Multilingual Access to Government Information

  6. Financial Use Case Harmonizing Business Registration across Europe XBRL (eXtensible Business Reporting Language) Europe Working Group works with Monnet on the xEBR taxonomy xEBR (XBRL European Business Register) taxonomy defines common concepts with mappings to country/language specific taxonomies National Bank of Belgium (Belgium) Eogs / DCCA (Denmark) Registrite ja infosüsteemide Keskus eRik (Estonia) Bilans Service - Infogreffe (France) Bundesanzeiger (Germany) Infocamere (Italy) RSCL (Luxembourg) Kamer van Koophandel (Netherlands) Informa DB – Colegio de Registradores (Spain) Bolagsverket (Sweden) Companies House (United Kingdom) EBR (Europe) GBR (Global) IASCF Bank of Spain Software – Audit – Consulting

  7. Public Services Use Case Translation of Dutch regulation (legal ontology) into several EU languages: Immigration law Tax law Student benefit law Health care benefit law Social security law Law on higher education

  8. Basic Ideas of my research • Term translation in isolation (no document or sentence context) – Experiment 1: domain-specific resources generation • addressing out-of-vocabulary issue – Experiment 2: contextual-semantic resource generation • term disambiguation

  9. Experiment 1 • Building and exploiting domain-specific resources 1 2 [1] http://www.linguee.com/ [2] http://en.wikipedia.org/

  10. Architecture of Experiment 1 Cross-Lingual Lexicon generation Generation Wikipedia of financial Title lexicon extraction xEBR Extraction of Decoding Taxonomy financial process labels Phrase Extraction of Table a parallel generation resource Domain-specific parallel corpus generation

  11. Domain-specific parallel corpus generation xEBR Taxonomy Parsing Phrase Table Querying HTML files generation financial labels Extraction of financial labels Decoding process

  12. Linguee http://www.linguee.com/

  13. Cross-Lingual Lexicon Generation xEBR Taxonomy Generation Wikipedia of financial Title lexicon extraction Extraction of Decoding financial process labels

  14. Cross-Lingual Lexicon Generation

  15. Cross-Lingual Lexicon Generation

  16. Cross-Lingual Lexicon Generation

  17. Cross-Lingual Lexicon Generation

  18. Cross-Lingual Lexicon Generation

  19. Cross-Lingual Lexicon Generation

  20. Cross-Lingual Lexicon Generation

  21. Cross-Lingual Lexicon Generation

  22. Domain-Specific resource Generation Overview Linguee parallel corpus on xEBR EN terms English: 24K sentences (1M tokens) German: 24K sentences (0.85M tokens) EuroParl (version 6): 1.7M sentences, 43M English, 40M German tokens JRC Acquis (version 3.0): 1.2M sentences, 32M English, 29M German tokens Wikipedia lexicon generation: 7334 translations (translation pairs) from English to German the other way around – Current assets <-> Umlaufvermögen – Balance sheet <-> Bilanz – Unpaid calls on subscribed capital • capital -> Kapital – Social security, post-employment and other employee benefit costs • employee benefit -> Sachbezug

  23. Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels

  24. Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels

  25. Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels

  26. Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels

  27. Mono-lingual Human Evaluation

  28. Manual Evaluation of Translation Quality • Evaluation on 63 English-German financial labels Translation into German Translation into English C an easily N one A cceptable A C N be fixed of both Linguee+Wikipedia 58% 27% 15% 56% 32% 12% Google Translate 55% 31% 14% 56% 31% 13% Linguee 51% 37% 12% 39% 40% 21% JRC-Acquis 32% 28% 40% 39% 31% 30% Europarl 5% 25% 70% 15% 30% 55%

  29. Annotator agreement scores • Evaluation on 63 English-German financial labels Agreement Metric Translation into German Translation into English S π κ α S π κ α Linguee+Wikipedia 0.599 0.528 0.533 0.530 0.532 0.452 0.457 0.454 Google Translate 0.698 0.655 0.657 0.657 0.480 0.460 0.465 0.463 Linguee 0.484 0.416 0.437 0.419 0.599 0.537 0.540 0.539 JRC-Acquis 0.412 0.406 0.413 0.408 0.363 0.359 0.366 0.360 Europarl 0.515 0.270 0.269 0.273 0.552 0.493 0.499 0.495

  30. Annotator agreement scores • Evaluation on 63 English-German financial labels Agreement Metric Translation into German Translation into English S π κ α S π κ α Linguee+Wikipedia 0.599 0.528 0.533 0.530 0.532 0.452 0.457 0.454 Google Translate 0.698 0.655 0.657 0.657 0.480 0.460 0.465 0.463 Linguee 0.484 0.416 0.437 0.419 0.599 0.537 0.540 0.539 JRC-Acquis 0.412 0.406 0.413 0.408 0.363 0.359 0.366 0.360 Europarl 0.515 0.270 0.269 0.273 0.552 0.493 0.499 0.495 substantial agreement

  31. Cross-Lingual Human Evaluation

  32. Manual Evaluation of Translation Quality • Evaluation on 142 English financial labels Can easily None of Translation into German Acceptable be fixed both Linguee+Wikipedia 59.15% 29.34% 11.50% Agreement Metric Translation into German S π κ α Linguee+Wikipedia 0.467 0.355 0.357 0.355

  33. Manual Evaluation of Translation Quality • Evaluation on 142 English financial labels Can easily None of Translation into German Acceptable be fixed both Linguee+Wikipedia 59.15% 29.34% 11.50% Agreement Metric Translation into German S π κ α Linguee+Wikipedia 0.467 0.355 0.357 0.355 fair agreement

  34. Discussion • Reference reduces semantics Source: Long-term financial assets Reference: Finanzanlagen Translation : Langfristige finanzielle Vermögenswerte • Reference adds semantics Source: Financial result Reference : Finanz- und Beteiligungs ergebnis Translation : Finanzergebnis • Domain training needed Source : equity Reference: Eigenkapital Translation (Google Translate): Gerechtigkeit

  35. Main Findings of Experiment 1 • Domain-specific resource gives better results than a bigger, but more general one • Linguee parallel corpus + domain-specific multilingual Wikipedia outperform Google Translate for translating German terms into English

Recommend


More recommend