machine translation an overview
play

Machine Translation: An Overview Marcello Federico FBK, Trento - - PDF document

Machine Translation: An Overview Marcello Federico FBK, Trento - Italy 2014 M. Federico MT 2014 Outline 1 Introduction Motivation Approaches Brief history Evaluation State-of-the-art Examples References: P.


  1. Machine Translation: An Overview Marcello Federico FBK, Trento - Italy 2014 M. Federico MT 2014 Outline 1 • Introduction • Motivation • Approaches • Brief history • Evaluation • State-of-the-art • Examples References: • P. Koehn, Statistical Machine Translation, Cambridge University Press, 2009. • A. Lopez, Statistical Machine Translation, ACM Computing Surveys, vol. 40, number 3, 2008. • D. Jurafsky and J. H. Martin, Speech and Language Processing, Prentice Hall, 2009. • C. Manning and H. Sch¨ utze, Foundations of Statistical Natural Language Processing, MIT Press, 199 9. M. Federico MT 2014

  2. Machine Translation 2 Wikipedia Machine translation, often referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. Personal Definition MT generally investigates the automatic translation of ”standard” language that can be systematically observed in ordinary communication – e.g. conversations, news, speeches, business letters, user manuals, etc. –. MT is generally not concerned with literature genres, nor creative and sophisticated use of language. For several reasons, such kind of language is simply out of the scope of MT. 1 For a very interesting introduction to issues related to the translation of literature work see Umberto Eco, ”Experiences in Translation”, U. Toronto Press, 2001. M. Federico MT 2014 Introduction to MT 3 Why is Machine Translation so Important? 1 • Information society and production of multilingual content 7 billion people - 193 countries - over 150 o ffi cial languages • Globalization and demand for translation services: 1,000 global companies operating in at least 160 countries • Size of worldwide translation market: 12.5 billion $ per year ≈ 34 million $ per day • Size of translation industry: 3,000 translation companies 250,000 translators • MT can improve productivity of human translators: integration of MT with human translation (post-editing) • MT can supply cheap gist translation competitive quality-cost-speed trade-o ff 1 Source: Common Sense Advisory, 2010 M. Federico MT 2014

  3. Introduction to MT 4 Do we need more research in MT? Chinglish examples, some of which resulting from MT errors. M. Federico MT 2014 Introduction to MT 5 Do we need more research in MT? M. Federico MT 2014

  4. Introduction to MT 6 Why is Machine Translation so Di ffi cult? High quality human translation implies: • deep and rich understanding of source language and text • sophisticated and creative command of target language Nowadays, feasible goals for machine translation are tasks were: • even approximate translation are helpful (gist translation) • professional translators can take advantage of it (computer assisted translation) • linguistic domain is very focused and limited (apps for travelers) In general, di ffi culty of translating depends on how similar the target and source languages are in their vocabulary, grammar, and conceptual structure. M. Federico MT 2014 Applications of MT 7 Gist translation for social media. M. Federico MT 2014

  5. Applications of MT 8 Carrier 12:00 PM 12:00 PM Carrier Speech translation app. M. Federico MT 2014 Applications of MT 9 Integration of MT into computer assisted translation. M. Federico MT 2014

  6. Di ff erences and Similarities of Languages 10 • Universal communicative role of language – names for people, words for talking about women, men, children – every language seems to have nouns and verbs • Di ff erences/similarities across large classes of languages : – Morphology: one vs. many morphemes per words, agglutination vs. fusion – Syntax: Subj-Verb-Obj structure (E) vs. SOV (J) vs. VSO (Irish) – Semantics: mapping of semantic roles and meaning of words e.g. direction/manner of motion indicated by verb/satellite in the bottle floated out (E) → la botella sali´ o flotando (S) • Lexical divergence between languages: – Semantical: there is no corresponding word with the same meaning wall (E) → Wand / Mauer (G, inside/outside) – Syntactical: a word is better translated into another part-of-speech she likes to sing (E,v) → sie singt gerne (D,adv) • Cultural Di ff erences : philosophical argument=is translation possible at all? M. Federico MT 2014 Lexical Divergence 11 English Japanese otooto (younger) brother oniisan (older) English Japanese isu (subj animate) is aru (subj not animate) English French ıtre (be acquainted with) know conna^ savoir (know a proposition) English French ils (masculine) they elles (feminine) German English Berg hill mountain • some languages make distinctions that other languages don’t • di ffi culty to translate from less specific into more specific information • ?? do language di ff erences enforce di ff erent conceptual structures ?? • ?? do people who speak di ff erent languages think di ff erently ?? 2 2 Watch talk by Lera Boroditsky (U. Stanford), ”How Language Shapes Thought”, fora.tv. M. Federico MT 2014

  7. Approaches to MT 12 Rough classification according to employed linguistic representations : • Direct model : translate and re-order single words or n-grams – basically, no linguistic representation is used • Transfer model : use explicit knowledge about language di ff erences – analyze lexical and syntactic structure of source sentence – transfer structures from source to target language – generate corresponding sentence in the target language • Interlingua model : extract the meaning and express it in the target language – analyze lexical, syntactical and semantical structure of source sentence – interpret the meaning into a canonical interlingua – generate the target sentence from the interlingua Notice: required knowledge for the interlingua approach grows linearly with number of languages, rather than to the square. M. Federico MT 2014 Vauquois’s Triangle 13 Interlingua Semantics Semantics Generation Analysis Transfer Syntax Syntax Source Target String String Direct M. Federico MT 2014

  8. Approaches to MT 14 How is knowledge and linguistic information acquired by the system? • Hand-crafted : knowledge for analysis, transfer, generation, meaning representation, or direct translation is manually developed – most of commercial MT systems fall into this category – requires lots of human labor and expertise – includes: rule-based MT • Machine-learned : representations are implemented by mathematical models learnable from data, e.g. parallel corpora of human translations – much less human e ff ort is needed – requires huge amounts of data, the more, the better! – includes: statistical MT and example-based MT M. Federico MT 2014 Transfer-Based MT 15 context-free grammar Synchronous context-free grammar NP DT NPB NP DT 1 NPB 2 / DT 1 NPB 2 → → NPB JJ NN NPB JJ 1 NN 2 / NN 2 JJ 1 → → / NPB NN NPB NN NN → → · · · · · · / DT the DT the il → → JJ north JJ north / settentrionale → → / NN wind NN wind vento → → · · · · · · NP NP settentrionale DT NPB DT NPB JJ NN NN JJ the north wind il vento settentrionale M. Federico MT 2014

  9. Transfer-Based MT 16 context-free grammar synchronous context-free grammar NP DT NPB NP DT 1 NPB 2 / DT 1 NPB 2 → → NPB JJ NN NPB JJ 1 NN 2 / NN 2 JJ 1 → → / NPB NN NPB NN NN → → · · · · · · / DT the DT the il → → JJ north JJ north / settentrionale → → / NN wind NN wind vento → → · · · · · · NP NP settentrionale DT NPB DT NPB JJ NN NN JJ the north wind il vento settentrionale 1 This is a toy example. Working approaches use a very large set of probabilistic and lexicalized rules. M. Federico MT 2014 Interlingua-Based MT 17 • Applied to linguistic domains with a limited number of relations and concepts – tourist information, hotel booking, flight reservation, ... • Semantics of a sentence can be expressed with predicate argument structure – I need a twin bed room reservation for tomorrow – book-room(date=tomorrow,type=single) • Interlingua language has to be designed carefully (by hand) – for some application formalism similar to SQL language • Processing steps in IBMT: – extract content from source sentence – map content into SQL like IL format - generate translation from IL format M. Federico MT 2014

Recommend


More recommend