Guy Dar Machine Translation Seminar Tel Aviv University 2014 } Pr - PowerPoint PPT Presentation

Guy Dar Machine Translation Seminar Tel Aviv University 2014

} Pr Problems: lems: ◦ Poor grammar. ◦ Distortion model is local local . (Instance of the former) } Solution (?) (?) : Unsup Unsupervised ervised syntax-based translation model. } Wh Which ich m mean eans: No linguistic predefined rules. } The system learns from a bilingual corpus.

Man Mandarin darin (Ch (Chin ines ese): e): Aozhou shi yu Bei Han you bangjiao Australia is with North Korea have diplomatic relations de shaoshu guojia zhiyi that few countries one of Correct t Translati tion: Australia is one of the few countries that have diplomatic relations with North Korea. Note: Correct translation requires reversing 5 elements.

} Idea Idea: Translating ‘linguistic’ structures - “templates” to templates, and not phrases to phrases. } How? How? Rules! for example: ◦ [1 [1] de [2 [2] à à the [2 [2] that [1 [1] ] ◦ [1 [1] zhiyi à à one of [1 [1] ◦ yu [1 [1] you [2 [2] à à have [2 [2] with [1 [1] } We can apply rules recursively. } This way we can derive the correct translation.

} Formal constr tructi tion: ◦ Each rule will be of the following form: X à < α, ¡γ, ¡~> ¡ where X is a non-terminal (variable), α is a string in the source language, and γ is a string in the target. Both strings consist of non-terminals and terminals, and ~ is a one-to-one correspondence between non-terminals in S and T. } In our model, we will use only two non- terminals: S, X.

} Our system will learn rules from the bilingual corpus only. } The only rules we add manually are two gl glue ue ru rules les: S à <S [1 [1] X [2 [2] , S [1 [1] X [2 [2] > • S à < X [1 [1] ,X [1 [1] > •

<S [1] [1] , S [1] [1] > initial pair à <S [2] [2] X [3] [3] , S [2] [2] X [3] [3] > S à < S [1] [1] X [2] [2], S [1] [1] X [2] [2] > à <S [4] [4] X [5] [5] X [3] [3] , S [4] [4] X [5] [5] X [3] [3] > S à < S [1] [1] X [2] [2], S [1] [1] X [2] [2] > à <X [6] [6] X [5] [5] X [3] [3] , X [6] [6] X [5] [5] X [3] [3] > S à < X [1] [1], X [1] [1] > à <Aozhou X [5] [5] X [3] [3] , Australia X [5] [5] X [3] [3] > X à < Aozhou , Australia> à <Aozhou shi X [3] [3] , Australia is X [3] [3] > X à <shi , is> à <Aozhou shi X [7] [7] zhiyi, Australia is one of X [7] [7] > X à <X [1] [1] zhiyi, one of X [1] [1] > à <Aozhou shi X [8] [8] de X [9] [9] zhiyi, Australia is one of the X [9] [9] that X [8] [8] > X à <X [1] [1] de X [2] [2] , the X [2] [2] that X [1] [1] > à <Aozhou shi yu X [1] [1] you X [2] [2] de X [9] [9] zhiyi, Australia is one of the X [9] [9] that have X [2] [2] with X [1] [1] > X à <yu X [1] [1] you X [2] [2] , have X [2] [2] with X [1] [1] >

} Let us now return to our system. } Every rule gets a weight (Log-linear model) : } φ i ¡ are ¡called ¡the ¡ features . ¡ } λ i ¡ ¡ are ¡the ¡ feature ¡weights .

} In our design, we have the following features: ◦ P( P( γ | α ) - what are the chances that γ is translated to α . . ◦ P( P( α | γ ) - the other way around. ◦ P w ( α | γ ) , P ) , P w ( γ | α ) – Lexical weights estimate how well the words are translated. (word alignment) ◦ Phrase penalty ty – a constant e = exp(1); We use it to penalize long derivations.(encourage?)

} Two special rules: ◦ w( S à <S [1 [2] > ) = exp(- λ g ) [1] X [2 [2] , S [1 [1] X [2 ◦ w(S à < X [1] ,X [1] >) = 1 } We also give weights to derivations (a sequence of rules), for every derivation D: Where the product is over all rules used in D. p lm is the language model and exp(- λ wp |e|) is the word penalty, to discourage use of too many words. (as opposed to phrase penalty) } Note: For things to go right, we must integrate the extra factors into the rule weights.

} Input: A word-aligned bilingual corpus. (many- to-many) } Objective: Learn hierarchical rules. } We are given a pair of word-aligned sentences <f,e,~> (f for French, e for English, ~ is the word-alignment) } Big pictu ture: First we extract initi tial phrase pairs pairs , then we refine them into more “sophisticated” rules.

} Initi tial phrase pair is a pair <f’,e’> s.t. : ◦ f’ is a substring of f, and e’ is a substring of e (a substring must be of the form str[i:j], no ‘holes’ are allowed) ◦ All words in f’ are aligned to words in e’ ◦ And vice versa, no words outside f’ mapped to e’ } Reminds something? Philipp Koehn, http://www.statmt.org/book/slides/05-phrase-based-models.pdf

} Every initial phrase pair gives us a rule X à <f’,e’> } Now, we construct new rules from existing: ◦ If X à < α, ¡γ> ¡is ¡a ¡rule, ¡ ¡ ◦ and ¡there’s ¡an ¡ini7al ¡phrase-‑pair ¡<f’,e’> ¡such ¡that ¡ α = ¡α 1 f’α 2 , ¡ γ = ¡ γ 1 e’γ 2 ¡ ¡ ◦ Then, ¡add ¡the ¡rule ¡ ¡ ¡ X à < ¡α 1 ¡ X [k] ¡ α 2 , ¡γ 1 ¡ X [k] ¡ γ 2 ¡ > ¡ Practi tically, we use additi tional heuristi tics to to make th this procedure more efficient t and less ambiguous.

} Our ¡es7mate ¡will ¡distribute ¡weights ¡equally ¡among ¡ all ¡ ini*al ¡phrase ¡pairs ; ¡ } Then, ¡every ¡ini7al ¡phrase ¡pair ¡distributes ¡its ¡weight ¡ equally ¡among ¡ all ¡rules ¡extracted ¡from ¡it. ¡ } Now, ¡we ¡use ¡this ¡es7mate ¡to ¡determine ¡ P( α | γ ), P( γ | α ) . ¡ } No7ce ¡that ¡we ¡yet ¡to ¡have ¡values ¡for ¡our ¡feature ¡ weights. ¡

} We are given a sentence f in the foreign language. } we would try to find the derivati tion with the best score that ends with f on the French side: arg argmax ax w(D) s. s.t. f(D)=f ◦ the English side of this derivation will be our translation of f.

} Our algorithm is basically a CKY parser. ◦ An algorithm to check whether a word belongs to a CFG. ◦ There is a CKY parser for weighted CFGs. } Since we cannot try all options, we use pru prunin ing techniques. (Similar to what we saw in Koehn’s chapter on decoding: http://www.statmt.org/book/slides/06-decoding.pdf)

} Consti titu tuent t (liguisti tics) – A single unit within a heirarchical structure. } We can factor a consti titu tuent t featu ture into the weight of a derivation D: 1 f[i:j] is a constituent c(i,j) = 0 otherwise } For every rule r. f[i:j] is the slice of the French side that r is ‘responsible for’. (the [leaves of] the subtree derived from r) } c(i,j) was learnt from Penn Chinese Treebank (ver. 3)

} Lan Languag ages es: Mandarin to English } Models Models com compared: pared: ◦ Pharaoh (Baseline) ◦ Hierarchical model ◦ Hierarchical model + constituent feature } Training set t ◦ Translation model - FBIS corpus (7.2M+9.2M) ◦ Language model - English newswire text (155M words)

} De Development t set t ◦ 2002 NIST MT evaluation test set } Test t set t ◦ 2003 NIST MT evaluation test set } Ev Evaluati tion ◦ BLEU

} Featu ture weights ts tu tuned by running Minimum Er Error- Rate te Trainer (MER ERT) on the development set. } Tuning results ts

} Difference between Baseline and hierarchical model is statistically significant

} New system improves state-of-art results.(in 2005) } Constituent feature improves results only slightly. (Statistically insignificant) } Further study suggests that increasing initial phrase max. length from 10 to 15 improve accuracy.

} David Chiang , A Hierarchical Phrase-Based Model for Statistical Machine Translation, http://www.aclweb.org/anthology/P05-1033 } Philipp Koehn , Statistical Machine Translation, http://www.statmt.org/book/ } Wikipedia , ◦ CYK algorithm [Last Modified Dec. 16, 2014], http://en.wikipedia.org/wiki/CYK_algorithm ◦ Constituent (Linguistics) [Last Modified Nov. 17, 2014], http://en.wikipedia.org/wiki/Constituent_%28linguistics%29

Guy Dar Machine Translation Seminar Tel Aviv University 2014 } Pr - PowerPoint PPT Presentation

Guy Dar Machine Translation Seminar Tel Aviv University 2014 } Pr Problems: lems: Poor grammar. Distortion model is local local . (Instance of the former) } Solution (?) (?) : Unsup Unsupervised ervised syntax-based translation

TEL AVIV NONSTOP CITY TEL AVIV NONSTOP CITY

Just a little of that human touch Daniel Genkin Itamar Pipman Technion and Tel Aviv University

High-dimensional classification by sparse logistic regression Felix Abramovich Tel Aviv

About the guy in front Conservation Biology BSC3052 About the guy in front About the guy in

Plasma Panel Sensor Tel-Aviv September 27 2012 The Tel-Aviv University team Professor

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

The Transformation of Dar Rapid Transit (DART) System Towards Soot-Free Buses Ir. Fanuel Kalugendo

ISPN 2018 IN TEL AVIV! ISPN 2018 IN TEL AVIV! Innovation and inspiration Innovation and

The Market for Keywords Kfir Eliaz (Tel Aviv & Michigan) Ran Spiegler (Tel Aviv &

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

The C limate-system H istorical F orecast P roject ( CHFP ) CITES2019 Ramiro Saurral (CIMA,

IRIG Forum ACRL Visual Literacy Standards Task Force Report By Tiffany Saulter and Sara

ParallelFX : parallelism made easy Jrmie Laval (Garuma) jeremie.laval@gmail.com

Welcome Tweet using #NERCOMPPDO1 and #a11y Agenda 9:00am Introduction 9:15am Lego &

Modelling the city-wide impact of street trees: an application to Basel, Switzerland Gianluca

Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven

BARCELONA CICLE DE LAIGUA, SA June 2016 1 1. Introduction 2. Mission 3. Strategic

Optimal Currents and Shape Synthesis in Electromagnetism Part II Topology Sensitivity

Sambuz

Useful Links

Newsletter

Mail Us

Guy Dar Machine Translation Seminar Tel Aviv University 2014 } Pr - PowerPoint PPT Presentation

Guy Dar Machine Translation Seminar Tel Aviv University 2014 } Pr Problems: lems: Poor grammar. Distortion model is local local . (Instance of the former) } Solution (?) (?) : Unsup Unsupervised ervised syntax-based translation

TEL AVIV NONSTOP CITY TEL AVIV NONSTOP CITY

Just a little of that human touch Daniel Genkin Itamar Pipman Technion and Tel Aviv University

High-dimensional classification by sparse logistic regression Felix Abramovich Tel Aviv

About the guy in front Conservation Biology BSC3052 About the guy in front About the guy in

Plasma Panel Sensor Tel-Aviv September 27 2012 The Tel-Aviv University team Professor

Statistical Machine Translation Nadir Durrani 21-November-2014 Machine Translation

The Transformation of Dar Rapid Transit (DART) System Towards Soot-Free Buses Ir. Fanuel Kalugendo

ISPN 2018 IN TEL AVIV! ISPN 2018 IN TEL AVIV! Innovation and inspiration Innovation and

The Market for Keywords Kfir Eliaz (Tel Aviv &amp; Michigan) Ran Spiegler (Tel Aviv &amp;

Introd u ction to machine translation MAC H IN E TR AN SL ATION IN P YTH ON Th u shan

Machine Translation Machine Translation February 13, 2008 Andreas Eisele UdS Computerlinguistik

Neural Machine Translation Gongbo Tang 8 October 2018 Outline Neural Machine Translation 1

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Machine Translation Philipp Koehn 28 April 2020 Philipp Koehn Artificial Intelligence: Machine

Statistical Machine Translation Statistical Machine Translation p Lecture 2 Theory and Praxis of

Computer Aided Translation Philipp Koehn 30 April 2015 Philipp Koehn Machine Translation:

The C limate-system H istorical F orecast P roject ( CHFP ) CITES2019 Ramiro Saurral (CIMA,

IRIG Forum ACRL Visual Literacy Standards Task Force Report By Tiffany Saulter and Sara

ParallelFX : parallelism made easy Jrmie Laval (Garuma) jeremie.laval@gmail.com

Welcome Tweet using #NERCOMPPDO1 and #a11y Agenda 9:00am Introduction 9:15am Lego &amp;

Modelling the city-wide impact of street trees: an application to Basel, Switzerland Gianluca

Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven

BARCELONA CICLE DE LAIGUA, SA June 2016 1 1. Introduction 2. Mission 3. Strategic

Optimal Currents and Shape Synthesis in Electromagnetism Part II Topology Sensitivity

Sambuz

Useful Links

Newsletter

Mail Us

The Market for Keywords Kfir Eliaz (Tel Aviv & Michigan) Ran Spiegler (Tel Aviv &

Welcome Tweet using #NERCOMPPDO1 and #a11y Agenda 9:00am Introduction 9:15am Lego &