Statistical Machine Translation Graham Neubig Nara Institute of - PowerPoint PPT Presentation

Statistical Machine Translation Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/23/2012 1

Statistical Machine Translation Machine Translation ● Automatically translate between languages Source Target Taro 太郎が花子を visited 訪問した。 Hanako. ● Real products/services being created! NAIST Travel Conversation Translation System (@AHC Lab) 2

Statistical Machine Translation How does machine translation work? Today I will give a lecture on machine translation . 3

Statistical Machine Translation How does machine translation work? ● Divide sentence into translatable patterns, reorder, combine Today I will give a lecture on machine translation . Today I will give a lecture on machine translation . 今日は、を行いますの講義機械翻訳。 Today machine translation a lecture on I will give . 今日は、の講義を行います機械翻訳。今日は、機械翻訳の講義を行います。 4

Statistical Machine Translation Problem ● There are millions of possible translations! 花子が太郎に会った Hanako met Taro Hanako met to Taro Hanako ran in to Taro Taro met Hanako The Hanako met the Taro ● How do we tell which is better? 5

Statistical Machine Translation Statistical Machine Translation ● Translation model: P(“ 今日” |“today”) = high P(“ 今日は、” |“today”) = medium P(“ 昨日” |“today”) = low ● Reordering Model: 鶏を食べる鶏が食べる鶏が食べる P( )=high P( )=high P( )=low chicken eats eats chicken eats chicken ● Language Model: P( “Taro met Hanako” )=high P( “the Taro met the Hanako” )=high 6

Statistical Machine Translation Creating a Machine Translation System ● Learn patterns from documents Models Documents Translation Model 太郎が花子を訪問した。 Taro visited Hanako. Reordering Model 花子にプレセントを渡した。 He gave Hanako a present. ... Language Model United Nations Text (English/French/Chinese/Arabic ...) Yomiuri Shimbun, Wikipedia Text 7 (Japanese/English)

Statistical Machine Translation How Do we Learn Patterns? ● For example, we go to an Italian restaurant w/ Japanese menu チーズムース Mousse di formaggi タリアテッレ４種のチーズソース Tagliatelle al 4 formaggi 本日の鮮魚 Pesce del giorno 鮮魚のソテーお米とグリーンピース添え Filetto di pesce su “Risi e Bisi” ドルチェとチーズ Dolce e Formaggi ● Try to find the patterns! 8

Statistical Machine Translation How Do we Learn Patterns? ● For example, we go to an Italian restaurant w/ Japanese menu チーズムース Mousse di formaggi タリアテッレ４種のチーズソース Tagliatelle al 4 formaggi 本日の鮮魚 Pesce del giorno 鮮魚のソテーお米とグリーンピース添え Filetto di pesce su “Risi e Bisi” ドルチェとチーズ Dolce e Formaggi ● Try to find the patterns! 9

Statistical Machine Translation Steps in Training a Phrase-based SMT System ● Collecting Data ● Tokenization ● Language Modeling ● Alignment ● Phrase Extraction/Scoring ● Reordering Models ● Decoding ● Evaluation ● Tuning

Statistical Machine Translation Collecting Data ● Sentence parallel data ● Used in: Translation model/Reordering model これはペンです。 This is a pen. 昨日は友達と食べた。 I ate with my friend yesterday. 象は花が長い。 Elephants' trunks are long. ● Monolingual data (in the target language) ● Used in: Language model This is a pen. I ate with my friend yesterday. Elephants' trunks are long.

Statistical Machine Translation Good Data is ● Big! → Translation Accuracy LM Data Size (Million Words) [Brants 2007] ● Clean ● In the same domain as test data

Statistical Machine Translation Collecting Data ● High quality parallel data from: ● Government organizations ● Newspapers ● Patents ● Crawl the web ● Merge several data sources

Statistical Machine Translation Finding Data on the Web ● Find bilingual pages [Resnik 03] [Image: Mainichi Shimbun]

Statistical Machine Translation Finding Data on the Web ● Finding bilingual pages [Resnik 03] ● Sentence alignment [Moore 02]

Statistical Machine Translation Question 1: ● Write down three candidates for sources of parallel data in English-Japanese, or some other language pair you are familiar with. ● They should all be of different genres.

Statistical Machine Translation Tokenization ● Example: Divide Japanese into words 太郎が花子を訪問した。太郎が花子を訪問した。 ● Example: Make English lowercase, split punctuation Taro visited Hanako. taro visited hanako .

Statistical Machine Translation Tokenization is Important! ● Just Right: Can translate properly taro ○ 太郎が太郎を taro ○ ● Too Long: Cannot translate if not in training data taro ○ 太郎が In Data 太郎を太郎を ☓ Not in Data ● Too Short: May mistranslate fat ro ☓ 太郎が太郎を fat ro ☓ 18

Statistical Machine Translation Language Modeling ● Assign a probability to each sentence E1: Taro visited Hanako P(E1) E2: the Taro visited the Hanako LM P(E2) E3: Taro visited the bibliography P(E3) ● More fluent sentences get higher probability P(E1) > P(E2) P(E1) > P(E3)

Statistical Machine Translation n-gram Models ● We want the probability of P(W = “Taro visited Hanako”) ● n-gram model calculates one word at a time ● Condition on n-1 previous words e.g. 2-gram model P(w 1 =“Taro”) * P(w 2 =”visited” | w 1 =“Taro”) * P(w 3 =”Hanako” | w 2 =”visited”) * P(w 4 =”</s>” | w 3 =”Hanako”) NOTE: sentence ending symbol </s> 20

Statistical Machine Translation Calculating n-gram Models ● n-gram models are estimated from data: P ( w i ∣ w i − n + 1 … w i − 1 )= c ( w i − n + 1 … w i ) c ( w i − n + 1 … w i − 1 ) i live in osaka . </s> i am a graduate student . </s> my school is in nara . </s> P(osaka | in) = c(in osaka)/c(in) = 1 / 2 = 0.5 n=2 → P(nara | in) = c(in nara)/c(in) = 1 / 2 = 0.5 21

Statistical Machine Translation Question 2: ● Calculate the 2-gram probabilities of the n-grams on the worksheet. 22

Statistical Machine Translation Alignment ● Find which words correspond to each-other 太郎が花子を訪問した。太郎が花子を訪問した。 taro visited hanako . taro visited hanako . ● Done automatically with probabilistic methods 日本語日本語日本語日本語日本語日本語 P( 花子 |hanako) = 0.99 日本語日本語日本語日本語日本語日本語 P( 太郎 |taro) = 0.97 P(visited| 訪問 ) = 0.46 English English P(visited| した ) = 0.04 English English English English English P( 花子 |taro) = 0.0001 English English English English English 23

Statistical Machine Translation IBM/HMM Models ● One-to-many alignment model ホテルの受付 the hotel front desk X X the hotel front desk ホテルの受付 ● IBM Model 1: No structure (“bag of words”) ● IBM Models 2-5, HMM: Add more structure 24

Statistical Machine Translation Combining One-to-Many Alignments ホテルの受付 the hotel front desk X X the hotel front desk ホテルの受付 Combine the hotel front desk ホテルの受付 ● Several different heuristics 25

Statistical Machine Translation Phrase Extraction ● Use alignments to find phrase pairs ホテ受ホテルの → hotel ルの付ホテルの → the hotel the 受付 → front desk hotel ホテルの受付 → hotel front desk front ホテルの受付 → the hotel front desk desk

Statistical Machine Translation Phrase Extraction Criterion ● Must have ● 1) one alignment inside the phrase ● 2) no alignments outside and in the same row/column ホテ受 OK! No alignments inside ルの付 the “ の” outside hotel front desk

Statistical Machine Translation Question 3: ● Given the alignments on the work sheet, which phrases will be extracted by the machine translation system? 28

Statistical Machine Translation Lexicalized Reordering ● Probability of monotone, swap, discontinuous 細太訪しい男が郎を問た the thin mono disc. man visited Taro swap 細い → the thin 太郎を → Taro high monotone probability high swap probability ● Conditioning on input/output, left/right, or both

Statistical Machine Translation Graham Neubig Nara Institute of - PowerPoint PPT Presentation

Statistical Machine Translation Statistical Machine Translation Graham Neubig Nara Institute of Science and Technology (NAIST) 10/23/2012 1 Statistical Machine Translation Machine Translation Automatically translate between languages

Statistical Machine Translation George Foster George Foster Statistical Machine Translation A

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Machine Translation 12: (Non-neural) Statistical Machine Translation Rico Sennrich University of

What can Statistical Machine Translation teach Neural Machine Translation about Structured

Chapter 8 Evaluation Statistical Machine Translation Evaluation How good is a given machine

Unsupervised Morpheme Analysis Competition 3: Statistical Machine Translation Mikko Kurimo, Sami

Introduction to Machine Translation Joost Bastings ILLC, University of Amsterdam

Improved Word Alignments for Statistical Machine Translation Alex Fraser Institute for NLP

Chapter 4 Word-based models Statistical Machine Translation Lexical Translation How to

Machine Translation Classification of divergences Classical and Statistical Approaches

Domain Adaptation in Statistical Machine Translation Logic, Language and Computation Bart

Workshop on statistical machine translation for curious translators Vctor M. Snchez-Cartagena

Large-scale deployment of statistical machine translation Example Microsoft

Part II: NLP Applications: Statistical Machine Translation Stephen Clark 1 How do Google do

Statistical Machine Translation What works and what does not Andreas Maletti Universitt

Chapter 6 Decoding Statistical Machine Translation Decoding We have a mathematical model for

Statistical Machine Translation The Main Idea Treat translation as a noisy channel problem:

Representing Huge Translation Models Statistical Machine Translation parallel text + alignment

Statistical Machine Translation Outline p Why Syntax? Lecture 5 Yamada and Knight:

Meta-Learning for Low Resource NMT Introduction Historically Statistical Translation

Lecture 14: Statistical Machine Translation Julia Hockenmaier juliahmr@illinois.edu 3324

Machine Translation Week 1: Classical approaches Classical and Statistical Approaches

Chapter 11 Tree-based models Statistical Machine Translation Tree-Based Models Traditional