Language and Computers Machine Translation Introduction Examples for Translations Background: Dictionaries Linguistic knowledge based systems Language and Computers Direct transfer systems Interlingua-based systems Machine Translation Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation Based on Dickinson, Brew, & Meurers (2013) What makes MT hard? Evaluating MT systems References 1 / 49
Language and What is Machine Translation? Computers Machine Translation Introduction Examples for Translations Background: Dictionaries Translation is the process of: Linguistic knowledge based systems ◮ moving texts from one (human) language ( source Direct transfer systems Interlingua-based systems language ) to another ( target language ), Machine learning based systems ◮ in a way that preserves meaning. Alignment Statistical Modeling Phrase-Based Translation Machine translation (MT) automates (part of) the process: What makes MT hard? ◮ Fully automatic translation Evaluating MT ◮ Computer-aided (human) translation systems References 2 / 49
Language and What is MT good for? Computers Machine Translation Introduction Examples for Translations ◮ When you need the gist of something and there are no Background: human translators around: Dictionaries Linguistic knowledge ◮ translating e-mails & webpages based systems ◮ obtaining information from sources in multiple Direct transfer systems Interlingua-based systems languages (e.g., search engines) Machine learning based systems ◮ If you have a limited vocabulary and a small range of Alignment Statistical Modeling sentence types: Phrase-Based Translation ◮ translating weather reports What makes MT hard? ◮ translating technical manuals Evaluating MT ◮ translating terms in scientific meetings systems References ◮ If you want your human translators to focus on interesting/difficult sentences while avoiding lookup of unknown words and translation of mundane sentences. 3 / 49
Language and Is MT needed? Computers Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ Translation is of immediate importance for multilingual Linguistic knowledge based systems countries (Canada, India, Switzerland, . . . ), Direct transfer systems Interlingua-based systems international institutions (United Nations, International Machine learning based systems Monetary Fund, World Trade Organization, . . . ), Alignment multinational or exporting companies. Statistical Modeling Phrase-Based Translation ◮ The European Union has 23 official languages. All What makes MT hard? federal laws and other documents have to be translated Evaluating MT into all languages. systems References 4 / 49
Language and Example translations Computers Machine Translation The simple case Introduction Examples for Translations Background: ◮ It will help to look at a few examples of real translation Dictionaries Linguistic knowledge before talking about how a machine does it. based systems Direct transfer systems ◮ Take the simple Spanish sentence and its English Interlingua-based systems translation below: Machine learning based systems Alignment (1) (Yo) hablo espa˜ nol. Statistical Modeling Phrase-Based Translation I speak 1 st , sg Spanish What makes MT hard? ‘I speak Spanish.’ Evaluating MT systems ◮ Words in this example pretty much translate one-for-one References ◮ But we have to make sure hablo matches with Yo , i.e., that the subject agrees with the form of the verb. 5 / 49
Language and Example translations Computers Machine Translation A slightly more complex case Introduction Examples for Translations Background: Dictionaries The order and number of words can differ: Linguistic knowledge based systems Direct transfer systems (2) a. Tu hablas espa˜ nol? Interlingua-based systems Machine learning You speak 2 nd , sg Spanish based systems Alignment ‘Do you speak Spanish?’ Statistical Modeling Phrase-Based Translation What makes MT b. Hablas espa˜ nol? hard? Speak 2 nd , sg Spanish Evaluating MT systems ‘Do you speak Spanish?’ References 6 / 49
Language and What goes into a translation Computers Machine Translation Introduction Examples for Translations Background: Dictionaries Some things to note about these examples and thus what Linguistic knowledge based systems we might need to know to translate: Direct transfer systems Interlingua-based systems ◮ Words have to be translated → dictionaries Machine learning based systems ◮ Words are grouped into meaningful units → syntax Alignment Statistical Modeling ◮ Word order can differ from language to languge Phrase-Based Translation What makes MT ◮ The forms of words within a sentence are systematic, hard? Evaluating MT e.g., verbs have to be conjugated, etc. systems References 7 / 49
Language and Different approaches to MT Computers Machine Translation Introduction Examples for Translations Background: Dictionaries Linguistic knowledge We’ll look at some basic approaches to MT: based systems Direct transfer systems ◮ Systems based on linguistic knowledge (Rule-Based Interlingua-based systems MT (RBMT)) Machine learning based systems ◮ Direct transfer systems Alignment Statistical Modeling ◮ Machine learning approaches, i.e., statistical machine Phrase-Based Translation translation (SMT) What makes MT hard? ◮ SMT is the most popular form of MT right now Evaluating MT systems References 8 / 49
Language and Dictionaries Computers Machine Translation Introduction Examples for Translations An MT dictionary differs from a “paper” dictionary: Background: Dictionaries ◮ must be computer-usable (electronic form, indexed) Linguistic knowledge based systems ◮ needs to be able to handle various word inflections Direct transfer systems Interlingua-based systems ◮ can contain (syntactic and semantic) restrictions that a Machine learning based systems word places on other words Alignment Statistical Modeling ◮ e.g., subcategorization information: give needs a giver, Phrase-Based Translation a person given to, and an object that is given What makes MT hard? ◮ e.g., selectional restrictions: if X eats , X must be Evaluating MT animate systems ◮ contains frequency information References ◮ for SMT, may be the only piece of additional information 9 / 49
Language and Direct transfer systems Computers Machine Translation A direct transfer systems consists of: Introduction Examples for Translations ◮ A source language grammar Background: Dictionaries ◮ A target language grammar Linguistic knowledge based systems ◮ Rules relating source language underlying Direct transfer systems Interlingua-based systems representation (UR) to target language UR Machine learning based systems ◮ A direct transfer system has a transfer component Alignment which relates a source language representation with a Statistical Modeling Phrase-Based Translation target language representation. What makes MT ◮ This can also be called a comparative grammar . hard? Evaluating MT We’ll walk through the following French to English example: systems References (3) Londres plaˆ ıt a ` Sam. London is pleasing to Sam ‘Sam likes London.’ 10 / 49
Language and Steps in a transfer system Computers Machine Translation Introduction 1. source language grammar analyzes the input and puts Examples for Translations Background: it into an underlying representation (UR). Dictionaries Londres plaˆ ıt ` a Sam → Londres plaire Sam (source UR) Linguistic knowledge based systems 2. The transfer component relates this source language Direct transfer systems Interlingua-based systems UR (French UR) to a target language UR (English UR). Machine learning based systems French UR English UR Alignment Statistical Modeling X plaire Y ↔ Eng(Y) like Eng(X) Phrase-Based Translation (where Eng(X) means the English translation of X) What makes MT hard? Londres plaire Sam (source UR) → Sam like London Evaluating MT systems (target UR) References 3. target language grammar translates the target language UR into an actual target language sentence. Sam like London → Sam likes London 11 / 49
Language and Notes on transfer systems Computers Machine Translation Introduction Examples for Translations Background: Dictionaries ◮ The transfer mechanism is in theory reversible; e.g., the Linguistic knowledge plaire rule works in both directions based systems ◮ Not clear if this is desirable: e.g., Dutch aanvangen Direct transfer systems Interlingua-based systems should be translated into English as begin , but begin Machine learning should be translated as beginnen . based systems Alignment ◮ Because we have a separate target language grammar, Statistical Modeling Phrase-Based Translation we are able to ensure that the rules of English apply; What makes MT hard? like → likes . Evaluating MT ◮ RBMT systems are still in use today, especially for more systems References exotic language pairs 12 / 49
Recommend
More recommend