Session 1: Overview � Course Overview Machine Translation � Week 1: “Classical” approaches – Classical and Statistical Approaches � Week 2: Data-driven, statistical approaches � Machine Translation: Some History � Major architectures/paradigms in “classical” Session 1: Overview Machine Translation � Translation challenges – a classification Jonas Kuhn Universität des Saarlandes, Saarbrücken The University of Texas at Austin jonask@coli.uni-sb.de DGfS/CL Fall School 2005, Ruhr-Universität Bochum, September 19-30, 2005 Jonas Kuhn: MT 2 Course Overview (1) Course Overview (2) � Week 2: Data-driven, statistical approaches � Week 1: “Classical” approaches � The noisy channel model � History & Overview [Brown et al. 1990, Knight 1999] � Transfer-based translation � Language modeling � Syntax-based transfer [Trujillo 1999] � Translation modeling � Transfer as LFG projection � Word alignment [Kaplan et al. 1999] � Phrase alignment � Interlingua-based translation [Koehn et al. 2003] � Decoding [Dorr 1994] [Koehn 1994] � Term-rewriting transfer [Emele/Dorna 1998] � Other uses of word alignments [Yarowsky et al. 2001] Jonas Kuhn: MT 3 Jonas Kuhn: MT 4
MT: Some History MT: Some History � 1947: Memo by Warren Weaver (Rockefeller Foundation) � Translation “I have a text in front of me which is written in Russian but I am � c. 2000 BC (Old Babylonian period): bilingual going to pretend that it is really written in English and that it Sumerian-Akkadian text fragments has been coded in some strange symbols. All I need to do is � China, 9 th century BC: references to translators and strip off the code in order to retrieve the information contained in the text.” interpreters (compare use of computers for cryptography in WW-II) � c. mid-8 th century BC: reference to interpreting in Old � 1954: first prototype of Russian-English MT system (GAT Testament (Genesis 42:23) system, Peter Toma; Georgetown University, Washington D.C.) � 240 BC: Livius Andronicus translates the Odyssey from � 1961, UT Austin: Linguistic Research Center (led by Winfred Lehmann) Greek into Latin � Fundamental research and development of METAL, a � 197 BC: Rosetta stone carved (three scripts: Egyptian bidirectional English-German transfer system hieroglyphs, Demotic, Greek; discovered in 1799, and � Initially funded by US Air Force Rome Air Development deciphered by Jean François Champillion in 1822) Center; since 1978 by Siemens � First commercial METAL system appeared in 1989 Jonas Kuhn: MT 5 Jonas Kuhn: MT 6 MT: Some History MT: Some History � 1966: The ALPAC Report (Automatic Language Processing Advisory Committee, commissioned by the US National Academy of Sciences) � no shortage of human translators, no immediate prospect of MT producing useful translation of general scientific texts � funding for MT was virtually stopped (especially in the USA) � Groups continuing to work on MT in 1970s: � TAUM group in Montreal: METEO system (used for translating weather forecasts since 1977) � groups in the USSR � GETA group in Grenoble, France � SUSY group in Saarbrücken, Germany � Peter Toma working on Systran (in various organizations) � Systran is now available for 36 language pairs; http://www.systransoft.com/ � Underlying technology in Babel Fish Translation (by Altavista) Jonas Kuhn: MT 7 Jonas Kuhn: MT 8
MT: Some History Architectures and Paradigms in MT � 1976: Commission of the European Communities Classification following Dorr/Jordan/Benoit (1999): A Survey of installs English-French version of Systran Current Paradigms in Machine Translation. In: Zelkowitz, Marvin (Hg.) Advances in Computers 49, 1-68. Academic Press, � commissions further language pairs of Systran London. � MT Architectures � 1982-1993: Eurotra – large-scale MT project funded � Direct translation by the European Communities � Transfer-based translation � 1993-2000: Verbmobil – large-scale speech-to- speech translation project funded by the German � Interlingua-based translation ministry for research � MT Paradigms � late 1980s-1990s: Candide project at IBM Watson � Linguistic-based paradigms Research Center – pioneering work in Statistical � Constraint-based MT, Knowledge-based MT, Lexical-based MT, Rule-based MT, Principle-based MT, Shake-and-Bake MT Machine Translation � Non-linguistic-based paradigms � Basis for all ongoing work in Statistical MT � Statistical-based MT, Example-based MT, Dialogue-based MT � Example: “Surprise Language Project” by DARPA – 1 month time for developing an MT system for a given language � Hybrid paradigms (June 2003: Hindi) (11 research institutions participated) Jonas Kuhn: MT 9 Jonas Kuhn: MT 10 Transfer vs. Interlingua The Vauquois Triangle � Some slides taken from Arturo Trujillo… � (author of “Translation Engines” 1999, Springer) Jonas Kuhn: MT 11 Jonas Kuhn: MT 12
Transfer vs. Interlingua Multilinguality – Transfer � Transfer : English Catalan Contrasts are fundamental to translation. Statements in one theory (source language) are mapped into statements in another theory (target language). German � Interlingua : Spanish Meanings are language independent and can be encoded. They are extracted from SL sentences and rendered as TL sentences. French Japanese Jonas Kuhn: MT 13 Jonas Kuhn: MT 14 Transfer vs. Interlingua Multilinguality – Interlingua English Catalan + Easier to implement + Eliminates redundancy + Good for mono- or bi- + Highly modular directional systems + Simplifies addition of + Humans work on 2 languages languages at a time - Different linguists may German Interlingua Spanish disagree on representation of - Modifications affect several meaning transfer modules - Difficult to ensure that TL - Inefficient for multilinguality generator can produce sentence from SL French Japanese representation Jonas Kuhn: MT 15 Jonas Kuhn: MT 16
Classifying translation challenges Types of divergence � Thematic divergence � Translation divergence: � Head-switching divergence � Meaning is conveyed by translation, although syntactic structure and semantic distribution of � Structural divergence meaning components is different in the two � Categorial divergence languages � Lexical gap (conflational divergence) � Translation mismatch � Divergence in lexicalization (lexical � Difference in information content between divergence) source and target sentence � Collocational divergence � Example (from Dorr 1994): translation of fish � Multi-lexeme and idiomatic divergence into Spanish – pez (alive), pescado (food) Jonas Kuhn: MT 17 Jonas Kuhn: MT 18 Types of divergence Types of divergence � Categorial divergence � Thematic divergence � En: a little bread � En: You like her � Sp: un poco de pan � Sp: Ella te gusta � (Lit: a bit of bread) � (Lit: She you-ACC pleases) � Lexical gap (conflational divergence) � Head-switching divergence � En: Camillo got up early � En: The baby just ate � Sp: Camillo madrugó � Sp: El bebé acaba de comer � (Lit: The baby finishes of to-eat) � En: I stabbed Juan � Sp: Yo le di puñeladas a Juan � Structural divergence � (Lit: I gave knife-wounds to Juan) � En: Luisa entered the house � Sp: Luisa entró a la casa � (Lit: Luisa entered to the house) Jonas Kuhn: MT 19 Jonas Kuhn: MT 20
Types of divergence Other translation challenges � Ambiguity: Language understanding problem � Divergence in lexicalization (lexical divergence) � En: Susan swam across the channel (compare Dorr et al. 1999) � Sp: Susan cruzó el canal nadando � Syntactic ambiguity � (Lit: Susan crossed the channel swimming) I saw the man on the hill with the telescope � Resolution may not be necessary, since ambiguity � Collocational divergence may transfer to target language � En: Jan made a decision � Lexical ambiguity � Sp: Jan tomó/*hizó una decisión (Lit: Jan took/*made a En: book �� Sp: libro / reservar decision) � Semantic ambiguity � Multi-lexeme and idiomatic divergence � Homography � En: Socrates kicked the bucket En: ball �� Sp: pelota (spherical object) / baile (formal dance) � Sp: Socrates estiró la pata (Lit: Socrates stretched the leg) � Polysemy En: kill �� Sp: matar (kill a man) / acabar (kill a En: Frank is as tall as Orlaith � process) Sp: Frank es tan alto como Orlaith (Lit: Frank is so tall like Orlaith) � Jonas Kuhn: MT 21 Jonas Kuhn: MT 22 Other translation challenges Other translation challenges � Ambiguity (compare Dorr et al. 1999) � Ambiguity (compare Dorr et al. 1999) � Complex semantic ambiguity � Contextual ambiguity � Homography En: The computer outputs the data; it is fast Sp: La computadora imprime los datos; es rápida En: The box was in the pen (es: singular) Sp: La caja estaba en el corral / *la pluma corral: enclosure, pluma: writing pen En: The computer outputs the data; it is stored in ascii � Metonymy Sp: La computadora imprime los datos; están almacenados En: While driving, John swerved and hit a tree en ascii (están: plural) Sp: Mientras que John estaba manejando, se desvió y � Complex contextual ambiguity golpeó con un arbol En: John hit the dog with a stick Sp: John golpeó el perro con el palo / que tenía el palo (‘While John was driving, (itself) swerved and hit with a tree’) (hit … with the stick / (the dog) that had a stick) Jonas Kuhn: MT 23 Jonas Kuhn: MT 24
Recommend
More recommend