machine translation machine translation
play

Machine Translation Machine Translation February 13, 2008 Andreas - PowerPoint PPT Presentation

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik & DFKI eisele@dfki.de Foundations of Language Science and Technology WS 2007/8 Machine Translation: Overview Machine Translation: Overview


  1. Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik & DFKI eisele@dfki.de Foundations of Language Science and Technology WS 2007/8

  2. Machine Translation: Overview Machine Translation: Overview LT1: Motivation and overview of MT paradigms, including rule-based, statistical, and hybrid techniques � Relevance of MT, typical applications and requirements � History of MT � Basic approaches to MT: rule/grammar based, statistical, example- based, hybrid/multi-engine � Evaluation techniques FLST: Focus on translation task (linguistic issues), including some algorithmic aspects � Differences between languages � Typical difficulties in translation � Treatment of ambiguity Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 2

  3. Sources for for Information Information Sources � MT in general, history: � http://www.MT-Archive.info: Electronic repository and bibliography of articles, books and papers on topics in machine translation and computer-based translation tools, regularly updated, contains over 3300 items � Hutchins, Somers: An introduction to machine translation. Academic Press, 1992, available under http://www.hutchinsweb.me.uk/IntroMT-TOC.htm � MT systems: Compendium of Translation Software, see http://www.hutchinsweb.me.uk/Compendium.htm � Statistical Machine Translation: See www.statmt.org Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 3

  4. Use cases cases and and requirements requirements for for MT MT Use L2 a) MT for assimilation Robustness L3 Coverage L1 MT … Daily throughput of Ln online-MT-Systems > 500 M Words b) MT for dissemination L2 Textual quality L3 MT L1 … Ln Publishable quality can only be authored by humans; Translation Memories & CAT-Tools mandatory c) MT for direct communication for professional translators Speech recognition, context dependence MT L1 L2 Topic of many running and completed research projects (VerbMobil, TC Star, TransTac, …) US-Military prepares deployment of systems for spoken MT Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 4

  5. History of of Machine Machine Translation Translation History � Slides by John Hutchins: http://www.hutchinsweb.me.uk/SUSU-2007-1-ppt.pdf Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 5

  6. Possible (rule- -base) MT architectures base) MT architectures Possible (rule The „Vauquois Triangle“ Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 6

  7. Statistical Machine Machine Translation Translation Statistical � Based on „distorted channel“ Paradigm (successful for pattern- and speech recognition ) P(E) P(F|E) � F � E � � Decoding: Given observation F , find most likely cause E* E* = argmax E P(E|F) = argmax E P(E,F) = argmax E P(E) * P(F|E) � Three subproblems each has approximative solutions Model of P(E) nGram-Models P(e 1 …e n ) = Π P(e i |e i-2 e i-1 ) Model of P(F|E) Transfer of „phrases“ P(F|E) = Π P( f i | e i )*P(d i ) Search for E* Heuristic (beam) search � Models are trained with (parallel) corpora, correspondences (alignments) between languages are estimated via EM-Algorithm (GIZA++, F.J.Och) Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 7

  8. Statistical Machine Machine Translation Translation Statistical schematic architecture Monolingual Parallel Corpus Corpus Counting, Alignment, Smoothing Phrase Phrase nGram- Extraction Table Model Source Target Decoder Text Text N-best Lists Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 8

  9. Strengths and and Weaknesses Weaknesses of SMT vs. RMBT of SMT vs. RMBT Strengths Englisch RMBT: translate pro SMT: Koehn 2005 We seem Wir scheinen Manchmal scheinen wir sometimes to have manchmal Anblick aus den Augen verloren lost sight of this dieser Tatsache haben, diese Tatsache. fact. verloren zu haben. The leaders of Die Leiter von Europa Die Führung Europas Europe have not haben keine klare nicht formuliert eine formulated a clear Vision formuliert. klare Vision. vision. I would like to Ich möchte mit einer Ich möchte abschließend close with a verfahrenstechnischen eine Frage zur procedural motion. Bewegung schließen. Geschäftsordnung ε . Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 9

  10. Motivation for for hybrid MT (1) hybrid MT (1) Motivation In the early 90s, SMT RBMT SMT and RBMT were seen in sharp contrast. -- ++ Syntax But advantages and disadvantages are -- Structural + complementary. Semantics - Lexical + � Search for Semantics integrated methods is -- Lexical now seen as natural + Adaptivity extension for both approaches Lexical - + Reliability Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 10

  11. Motivation for for hybrid MT (2) hybrid MT (2) Motivation � Statistical and rule-based approaches address different types of knowledge: � Rule-based approaches focus on linguistic knowledge � Statistical approaches provide a holistic, integrated model that also incorporates (some) implicit knowledge of the world � All available types of knowledge are urgently required, as the task is too difficult to ignore important aspects � Research on a deep integration of statistical and linguistic approaches is required but this will take some time � In the meantime, we can try to tinker with existing MT engines Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 11

  12. Some hybrid MT hybrid MT architectures architectures Some = SMT Module = RBMT Module Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 12

  13. SMT feeds feeds rule rule- -based based MT MT SMT Motivation: � Adapting RBMT to new domains requires lots of new lexical entries that are difficult to write manually � SMT techniques can help to partially automate this process BUT: � Not all required information can be learned from data � Errors in examples/SMT alignment may creep in, but RBMT has no mechanism to discard implausible outcomes � Some manual effort is required Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 13

  14. Differences between between Languages Languages Differences Languages can differ in many ways (studied in language typology) Morphology: Morpheme-to-word ratio: Isolating � � Synthetic �� Polysynthetic Segmentability: Agglutinative �� Fusion Language Syntax: Word order: SVO vs. SOV vs. VSO vs. V2 vs. Unconstrained (+ case marked) Whether to use determiners or not Head-marking vs. dependent-marking: Verb-framed vs. satellite-framed: EN: The bottle floated out. ES: La botella salió flotando. The bottle exited floating. Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 14

  15. Differences in in Specificity Specificity of of Expressions Expressions Differences Translation into a language using more specific expression requires us to make decisions that may be rather difficult. (Examples taken from Jurafsky & Martin and Hutchins & Somers) Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 15

  16. Differences in in Conceptual Conceptual Space Space Differences Different expressions in French and English: Jurafsky & Martin‘s visualisation of data from Hutchins & Somers Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 16

  17. More bilingual bilingual lexical lexical differences differences More • bilingual lexical ambiguity (more than one equivalent, whether ambiguous in SL or not): – river: fleuve/rivière – Taube: dove/pigeon – Schraube: screw/bolt/propellor – corner: coin or angle; Ecke or Winkel – light: léger, clair, facile, allumer, lumière, lampe, feu – look: regarder, chercher, sembler • lexical gaps – dacha, cottage, marmelade, vodka, etc. – snub: infliger un affront; verächtlich behandeln, or: derb zurückweisen – het Turks kennen: to know Turkish – kenner van het Turks: *knower of Turkish, someone who knows Turkish • Solved (?) by contextual rules (RBMT), or examples (EBMT), or frequencies and ‘language models’ (SMT) Foundations of Language Science and Technology (WS 2007/8): Machine Translation eisele@dfki.de 17

Recommend


More recommend