Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 - PowerPoint PPT Presentation

Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

The Story so Far: Generative Models 1 • The definition of translation probability follows a mathematical derivation argmax e p ( e | f ) = argmax e p ( f | e ) p ( e ) • Occasionally, some independence assumptions are thrown in for instance IBM Model 1: word translations are independent of each other p ( e | f , a ) = 1 � p ( e i | f a ( i ) ) Z i • Generative story leads to straight-forward estimation – maximum likelihood estimation of component probability distribution – EM algorithm for discovering hidden variables (alignment) Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Log-linear Models 2 • IBM Models provided mathematical justification for multiplying components p LM × p T M × p D • These may be weighted p λ LM LM × p λ T M T M × p λ D D • Many components p i with weights λ i � p λ i i i • We typically operate in log space � � p λ i λ i log ( p i ) = log i i i Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Knowledge Sources 3 • Many different knowledge sources useful – language model – reordering (distortion) model – phrase translation model – word translation model – word count – phrase count – character count – drop word feature – phrase pair frequency – additional language models • Could be any function h ( e , f , a ) � 1 if ∃ e i ∈ e , e i is verb h ( e , f , a ) = 0 otherwise Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Set Feature Weights 4 • Contribution of components p i determined by weight λ i • Methods – manual setting of weights: try a few, take best – automate this process • Learn weights – set aside a development corpus – set the weights, so that optimal translation performance on this development corpus is achieved – requires automatic scoring method (e.g., BLEU) Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Discriminative vs. Generative Models 5 • Generative models – translation process is broken down to steps – each step is modeled by a probability distribution – each probability distribution is estimated from data by maximum likelihood • Discriminative models – model consist of a number of features (e.g. the language model score) – each feature has a weight, measuring its value for judging a translation as correct – feature weights are optimized on development data, so that the system output matches correct translations as close as possible Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Overview 6 • Generate a set of possible translations of a sentence (candidate translations) • Each candidate translation represented using a set of features • Each feature derives from one property of the translation – feature score: value of the property (e.g., language model probability) – feature weight: importance of the feature (e.g., language model feature more important than word count feature) • Task of discriminative training: find good feature weights • Highest scoring candidate is best translation according to model Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Discriminative Training Approaches 7 • Reranking: 2 pass approach – first pass: run decoder to generate set of candidate translations – second pass: ∗ add features ∗ rescore translations • Tuning – integrate all features into the decoder – learn feature weights that lead decoder to best translation • Large scale discriminative training (next lecture) – thousands or millions of features – optimization of the entire training corpus – requires different training methods Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

8 finding candidate translations Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Finding Candidate Translations 9 • Number of possible translations exponential with sentence length • But: we are mainly interested in the most likely ones • Recall: decoding – do not list all possible translation – beam search for best one – dynamic programming and pruning • How can we find set of best translations? Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Search Graph 10 home goes not p:-5.012 p:-1.648 p:-3.526 he -4.672 -3.569 p:-0.556 to house p:-4.334 go does not -2.729 p:-2.743 p:-1.664 it p:-0.484 home p:-4.182 to goes p:-2.839 are p:-1.388 go house p:-1.220 p:-4.087 p:-5.912 • Decoding explores space of possible translations by expanding the most promising partial translations ⇒ Search graph Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Search Graph 11 home goes not p:-5.012 p:-1.648 p:-3.526 he -4.672 -3.569 p:-0.556 to house p:-4.334 go does not -2.729 p:-2.743 p:-1.664 it p:-0.484 home p:-4.182 to goes p:-2.839 are p:-1.388 go house p:-1.220 p:-4.087 p:-5.912 • Keep transitions from recombinations – without: total number of paths = number of full translation hypotheses – with: combinatorial expansion • Example – without: 4 full translation hypotheses – with: 10 different full paths • Typically many more paths due to recombination Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Word Lattice 12 6 4 8 not home 1 . - -1.878 e o m h -0.912 he goes goes not not -1.146 goes to house <s> he -2.146 - 1 . 1 0 8 d not o 6 e s -1.591 to house 5 5 n o . t 0 - -0.819 to house e h not go -1.220 does not go -1.439 - 0 . 4 8 4 does not - 1 home . <s> it i t 4 5 - -0.904 1 1 t . o 2 2 go home 0 goes a r e - not to 1 . it goes 2 4 8 g o - 0 . <s> are 8 2 5 to go h o u s e go house • Search graph as finite state machine – states: partial translations – transitions: applications of phrase translations – weights: added scores by phrase translation Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Finite State Machine 13 • Formally, a finite state machine, is a q quintuple (Σ , S, s 0 , δ, F ) , where – Σ is the alphabet of output symbols (in our case, the emitted phrases) – S is a finite set of states – s 0 is an initial state ( s 0 ∈ S ), (in our case the initial hypothesis) – δ is the state transition function δ : S × Σ → S – F is the set of final states (in our case representing hypotheses that have covered all input words). • Weighted finite state machine – scores for emissions from each transition π : S × Σ × S → R Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

N-Best List 14 rank score sentence 1 -4.182 he does not go home 2 -4.334 he does not go to house 3 -4.672 he goes not to house 4 -4.715 it goes not to house 5 -5.012 he goes not home 6 -5.055 it goes not home 7 -5.247 it does not go home 8 -5.399 it does not go to house 9 -5.912 he does not to go house 10 -6.977 it does not to go house • Word graph may be too complex for some methods ⇒ Extract n best translations Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Computing N-Best Lists 15 not home m e o h he goes goes not not -0.338 goes to house <s> he -0.043 -0.830 not d 6 o e to house 5 s 5 n - . o 0 0 t . 1 - to house 5 2 e h -1.065 not go does not go - 0 . 4 8 4 does not home <s> it i t - 1 -1.730 . t 2 o go home 2 0 goes a r e not to it goes g o <s> are to go h o u s e go house • Representing the graph with back transitions • Include ”detours” with cost Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Path 1 16 <s> he -0.830 d 6 o 5 e s 5 - . n 0 o 0 . t 1 - 5 2 e h not go does not -1.065 go home -1.730 go home • Follow back transitions ⇒ Best path: he does not go home • Keep note of detours from this path Base path Base cost Detour cost Detour state final -0 -0.152 to house final -0 -0.830 not home final -0 -1.065 does not final -0 -1.730 go house Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Path 2 17 -0.338 <s> he d 6 o to house 5 e s 5 n - . 0 o . 0 t 1 - to house 5 2 e h not go -1.065 does not go • Take cheapest detour • Afterwards, follow back transitions • Second best path: he does not go to house • Add its detours to priority queue Base path Base cost Detour cost Detour state to house -0.152 -0.338 goes not final -0 -0.830 not home final -0 -1.065 does not to house -0.152 -1.065 it final -0 -1.730 go house Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Path 3 18 he goes goes not not -0.338 goes to house <s> he -0.043 6 to house 5 5 - . 0 . 0 1 - 5 2 e h • Third best path: he goes not to house • Add its detours to priority queue Base path Base cost Detour cost Detour state to house / goes not -0.490 -0.043 it goes final -0 -0.830 not home final -0 -1.065 does not to house -0.152 -1.065 it final -0 -1.730 go house Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017

Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 - PowerPoint PPT Presentation

Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

Tuning tuning curves So far: Receptive fields Representation of stimuli Population vectors

A Java Based Interactive Control Design and Tuning Platform Aaron Radke 7/9/3 Aaron Radke

Foundations of Foundations of Automated Database Tuning Automated Database Tuning Surajit

Machine learning with H2O Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning in R

Machine learning with mlr Dr. Shirin Elsinghorst Data Scientist DataCamp Hyperparameter Tuning

fine-tuning April 9, 2019 1 Fine Tuning In [1]: % matplotlib inline import d2l from mxnet

Commercial meets Open Source Tuning STATISTICA with R Tuning STATISTICA with R Christian H.

Data Mining II Optimization & Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Tuning and orbit feedback in Storage Ring Light Sources Susan Smith ASTeC, Daresbury

Signal detection theory z p[r|-] p[r|+] <r> + <r> - Role of priors : Find z by

Tuning Cavity Arrays with Non-Linear Dielectrics Andrew

Platforms, ecosystems, and the future of software Michael Mace, April 7, 2009

A Day with ET John Jannotti UXISP March 16, 2011 John Jannotti (uxisp) A Day with ET March

A Velocity-Representation Model For MT Cells Eero Simoncelli Computer and Information Science

EXPERIENCE WITH TUNING Joel Susskind, Fricky Keita, and John Blaisdell NASA GSFC Sounder

Modified gravity and the cosmological constant problem Based on arxiv:1106.2000 [hep-th]

Turing Machines Context-Free Languages n b n R a ww Regular Languages a * a * b * 1

Sambuz

Useful Links

Newsletter

Mail Us

Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 - PowerPoint PPT Presentation

Tuning Philipp Koehn presented by Gaurav Kumar 28 September 2017 Philipp Koehn / Gaurav Kumar Machine Translation: Tuning 28 September 2017 The Story so Far: Generative Models 1 The definition of translation probability follows a

PAC PACE AUT AUTO-WER WERKS KS Vehicle Tuning Services Performance tuning with fuel

TUNING Russia: Development of master programmes in engineering education using the Tuning

Parameters vs hyperparameters Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning

SELF TUNING MEMORY MANAGEMENT FOR DATA SERVERS By Sangeetha Sivaprakasam Introduction : 1)

CHAPTER 9: PID TUNING Process Solve the tuning Apply, is the reaction curve problem. Requires

Hyperparameter tuning in caret Dr. Shirin Glander Data Scientist DataCamp Hyperparameter

Elementary Particles Lecture 4 Niels Tuning Harry van der Graaf Niels Tuning (1) Thanks

Tuning tuning curves So far: Receptive fields Representation of stimuli Population vectors

A Java Based Interactive Control Design and Tuning Platform Aaron Radke 7/9/3 Aaron Radke

Foundations of Foundations of Automated Database Tuning Automated Database Tuning Surajit

Machine learning with H2O Dr. Shirin Glander Data Scientist DataCamp Hyperparameter Tuning in R

Machine learning with mlr Dr. Shirin Elsinghorst Data Scientist DataCamp Hyperparameter Tuning

fine-tuning April 9, 2019 1 Fine Tuning In [1]: % matplotlib inline import d2l from mxnet

Commercial meets Open Source Tuning STATISTICA with R Tuning STATISTICA with R Christian H.

Data Mining II Optimization &amp; Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Tuning and orbit feedback in Storage Ring Light Sources Susan Smith ASTeC, Daresbury

Signal detection theory z p[r|-] p[r|+] &lt;r&gt; + &lt;r&gt; - Role of priors : Find z by

Tuning Cavity Arrays with Non-Linear Dielectrics Andrew

Platforms, ecosystems, and the future of software Michael Mace, April 7, 2009

A Day with ET John Jannotti UXISP March 16, 2011 John Jannotti (uxisp) A Day with ET March

A Velocity-Representation Model For MT Cells Eero Simoncelli Computer and Information Science

EXPERIENCE WITH TUNING Joel Susskind, Fricky Keita, and John Blaisdell NASA GSFC Sounder

Modified gravity and the cosmological constant problem Based on arxiv:1106.2000 [hep-th]

Turing Machines Context-Free Languages n b n R a ww Regular Languages a * a * b * 1

Sambuz

Useful Links

Newsletter

Mail Us

Data Mining II Optimization & Parameter Tuning Heiko Paulheim Why Parameter Tuning?

Signal detection theory z p[r|-] p[r|+] <r> + <r> - Role of priors : Find z by