Experiments in Term Translation Mihael Arcan DERI, NUI Galway Supervised by Dr. Paul Buitelaar Monnet is supported by the European Union under Grant No. 248458
Motivation • Generation of ‘multilingual’ ontologies – most of the ontologies are in English language – terms need to be translated
Overview • Monnet Project • Research – building domain-specific resources • architecture • domain-specific resources • results and evaluation • main findings – term disambiguation • building a contextual-semantic resource – general parallel resource – ontology – future work
Monnet Business Information in EN, DE, NL, ES etc. http://www.monnet-project.eu
Research Objectives & Use Cases Research Objectives • Development and use of ‘multilingual ontologies’ – ontologies with rich multilingual descriptors • Exploit ‘domain semantics’ to improve Machine Translation – use of ontological, terminological, linguistic knowledge Use Cases • Financial Use Case – Cross-lingual Business Intelligence • Public Services Use Case – Multilingual Access to Government Information
Financial Use Case Harmonizing Business Registration across Europe XBRL (eXtensible Business Reporting Language) Europe Working Group works with Monnet on the xEBR taxonomy xEBR (XBRL European Business Register) taxonomy defines common concepts with mappings to country/language specific taxonomies National Bank of Belgium (Belgium) Eogs / DCCA (Denmark) Registrite ja infosüsteemide Keskus eRik (Estonia) Bilans Service - Infogreffe (France) Bundesanzeiger (Germany) Infocamere (Italy) RSCL (Luxembourg) Kamer van Koophandel (Netherlands) Informa DB – Colegio de Registradores (Spain) Bolagsverket (Sweden) Companies House (United Kingdom) EBR (Europe) GBR (Global) IASCF Bank of Spain Software – Audit – Consulting
Public Services Use Case Translation of Dutch regulation (legal ontology) into several EU languages: Immigration law Tax law Student benefit law Health care benefit law Social security law Law on higher education
Basic Ideas of my research • Term translation in isolation (no document or sentence context) – Experiment 1: domain-specific resources generation • addressing out-of-vocabulary issue – Experiment 2: contextual-semantic resource generation • term disambiguation
Experiment 1 • Building and exploiting domain-specific resources 1 2 [1] http://www.linguee.com/ [2] http://en.wikipedia.org/
Architecture of Experiment 1 Cross-Lingual Lexicon generation Generation Wikipedia of financial Title lexicon extraction xEBR Extraction of Decoding Taxonomy financial process labels Phrase Extraction of Table a parallel generation resource Domain-specific parallel corpus generation
Domain-specific parallel corpus generation xEBR Taxonomy Parsing Phrase Table Querying HTML files generation financial labels Extraction of financial labels Decoding process
Linguee http://www.linguee.com/
Cross-Lingual Lexicon Generation xEBR Taxonomy Generation Wikipedia of financial Title lexicon extraction Extraction of Decoding financial process labels
Cross-Lingual Lexicon Generation
Cross-Lingual Lexicon Generation
Cross-Lingual Lexicon Generation
Cross-Lingual Lexicon Generation
Cross-Lingual Lexicon Generation
Cross-Lingual Lexicon Generation
Cross-Lingual Lexicon Generation
Cross-Lingual Lexicon Generation
Domain-Specific resource Generation Overview Linguee parallel corpus on xEBR EN terms English: 24K sentences (1M tokens) German: 24K sentences (0.85M tokens) EuroParl (version 6): 1.7M sentences, 43M English, 40M German tokens JRC Acquis (version 3.0): 1.2M sentences, 32M English, 29M German tokens Wikipedia lexicon generation: 7334 translations (translation pairs) from English to German the other way around – Current assets <-> Umlaufvermögen – Balance sheet <-> Bilanz – Unpaid calls on subscribed capital • capital -> Kapital – Social security, post-employment and other employee benefit costs • employee benefit -> Sachbezug
Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels
Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels
Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels
Automatic Evaluation Translation Exact Translation source BLEU Meteor Direction Translation English to German 9 0.1267 0.4795 Acquis German to English 12 0.1673 0.3726 English to German 5 0.0207 0.4120 Europarl German to English 4 0.1132 0.3258 English to German 15 0.3641 0.6309 Linguee German to English 25 0.3471 0.4084 Linguee + Domain- English to German 22 0.3479 0.6438 specific lexicon substitution German to English 25 0.3237 0.4315 English to German 21 0.4517 0.6410 Google Translate German to English 18 0.2640 0.3688 • Evaluation on 63 English-German financial labels
Mono-lingual Human Evaluation
Manual Evaluation of Translation Quality • Evaluation on 63 English-German financial labels Translation into German Translation into English C an easily N one A cceptable A C N be fixed of both Linguee+Wikipedia 58% 27% 15% 56% 32% 12% Google Translate 55% 31% 14% 56% 31% 13% Linguee 51% 37% 12% 39% 40% 21% JRC-Acquis 32% 28% 40% 39% 31% 30% Europarl 5% 25% 70% 15% 30% 55%
Annotator agreement scores • Evaluation on 63 English-German financial labels Agreement Metric Translation into German Translation into English S π κ α S π κ α Linguee+Wikipedia 0.599 0.528 0.533 0.530 0.532 0.452 0.457 0.454 Google Translate 0.698 0.655 0.657 0.657 0.480 0.460 0.465 0.463 Linguee 0.484 0.416 0.437 0.419 0.599 0.537 0.540 0.539 JRC-Acquis 0.412 0.406 0.413 0.408 0.363 0.359 0.366 0.360 Europarl 0.515 0.270 0.269 0.273 0.552 0.493 0.499 0.495
Annotator agreement scores • Evaluation on 63 English-German financial labels Agreement Metric Translation into German Translation into English S π κ α S π κ α Linguee+Wikipedia 0.599 0.528 0.533 0.530 0.532 0.452 0.457 0.454 Google Translate 0.698 0.655 0.657 0.657 0.480 0.460 0.465 0.463 Linguee 0.484 0.416 0.437 0.419 0.599 0.537 0.540 0.539 JRC-Acquis 0.412 0.406 0.413 0.408 0.363 0.359 0.366 0.360 Europarl 0.515 0.270 0.269 0.273 0.552 0.493 0.499 0.495 substantial agreement
Cross-Lingual Human Evaluation
Manual Evaluation of Translation Quality • Evaluation on 142 English financial labels Can easily None of Translation into German Acceptable be fixed both Linguee+Wikipedia 59.15% 29.34% 11.50% Agreement Metric Translation into German S π κ α Linguee+Wikipedia 0.467 0.355 0.357 0.355
Manual Evaluation of Translation Quality • Evaluation on 142 English financial labels Can easily None of Translation into German Acceptable be fixed both Linguee+Wikipedia 59.15% 29.34% 11.50% Agreement Metric Translation into German S π κ α Linguee+Wikipedia 0.467 0.355 0.357 0.355 fair agreement
Discussion • Reference reduces semantics Source: Long-term financial assets Reference: Finanzanlagen Translation : Langfristige finanzielle Vermögenswerte • Reference adds semantics Source: Financial result Reference : Finanz- und Beteiligungs ergebnis Translation : Finanzergebnis • Domain training needed Source : equity Reference: Eigenkapital Translation (Google Translate): Gerechtigkeit
Main Findings of Experiment 1 • Domain-specific resource gives better results than a bigger, but more general one • Linguee parallel corpus + domain-specific multilingual Wikipedia outperform Google Translate for translating German terms into English
Recommend
More recommend