Exploiting Source Similarity for SMT Using Context-Informed Features - PowerPoint PPT Presentation

Exploiting Source Similarity for SMT Using Context-Informed Features Nicolas Stroppa ( nstroppa@computing.dcu.ie ) Antal van den Bosch ( Antal.vdnBosch@uvt.nl ) Andy Way ( away@computing.dcu.ie )

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Overview 1. Motivation 2. The Standard Approach 3. Context-Informed Features 4. Memory-Based Disambiguation 5. An Example 6. Evaluation & Results 7. Related Work 8. Conclusions 9. Future Work Exploiting Source Similarity for SMT Using Context-Informed Features 2

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation • SMT is target-similarity-based ; • EBMT is source-similarity-based . Can we exploit both benefits in one model? Exploiting Source Similarity for SMT Using Context-Informed Features 3

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation: SMT is target-similarity-based The probability of a target sentence w.r.t. an n -gram-based LM can be seen as a measure of similarity between this sentence and those sentences found in the training corpus C . The LM will assign high probabilities to those sentences that share lots of n -grams with the sentences in C , while sentences with few n -gram matches will receive low probabilities. ⇒ the LM is used to make the resulting translation as similar as possible to previously seen target sentences. Exploiting Source Similarity for SMT Using Context-Informed Features 4

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation: EBMT is source-similarity-based There are 3 processing stages in EBMT: 1. retrieving ‘similar’ fragments of the input string against the reference corpus; 2. identifying the corresponding translation fragments; 3. recombining these translation fragments into the appropriate target text. Depending on the exact EBMT method used, different notions of ‘similarity’ are employed. However, all models of EBMT rely on the retrieval of source sentences similar to the new input string in the training material. Exploiting Source Similarity for SMT Using Context-Informed Features 5

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation: Benefits of a Combined Model • Source similarity may limit ambiguity problems; • Target similarity may avoid problems such as boundary friction . By exploiting the two types of similarity, we might benefit from the strengths of both aspects. Exploiting Source Similarity for SMT Using Context-Informed Features 6

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Phrase-Based SMT In SMT, translation is modeled as a decision process, in which the translation e I 1 = e 1 . . . e i . . . e I of a source sentence f J 1 = f 1 . . . f j . . . f J is chosen to maximize: P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (1) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 8

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Translation Model P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (2) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 9

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Language Model P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (3) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 10

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT In log-linear phrase-based SMT, the posterior probability P ( e I 1 | f J 1 ) is directly modeled as a (log-linear) combination of features [Och & Ney, ACL-02], that usually comprise M translational features (e.g. sentence length, lexical features, grammatical dependencies), and the language model: m log P ( e I 1 | f J λ m h m ( f J 1 , e I 1 , s K 1 ) + λ LM log P ( e I � 1 ) = 1 ) (4) m =1 where s K 1 = s 1 . . . s k denotes a segmentation of the source and target sentences respectively into the sequences of phrases ( ˜ f 1 , . . . , ˜ f k ) and ( ˜ e 1 , . . . , ˜ e k ) . Exploiting Source Similarity for SMT Using Context-Informed Features 11

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Each feature h m in log-linear PB-SMT can be rewritten as: K h m ( f J 1 , e I 1 , s K ˜ h m ( ˜ � 1 ) = f k , ˜ e k , s k ) , (5) k =1 where ˜ h m is a feature that applies to a single phrase-pair. That is, while the features in log-linear PB-SMT can apply to entire sentences in theory, in practice, those features apply to single phrase pairs (in existing models). Remarkably, then, the usual translational features involved in those models only depend on an individual pair of source/target phrases, i.e. they do not take into account the contexts of those phrases. Exploiting Source Similarity for SMT Using Context-Informed Features 12

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT In this context, the translation process amounts to: • choosing a segmentation of the source sentence, • translating each source phrase, and possibly • re-ordering the target segments obtained. But translational choices are strongly driven by the target LM. Instead, we will try to use the source context to resolve ambiguities ... Exploiting Source Similarity for SMT Using Context-Informed Features 13

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? Exploiting Source Similarity for SMT Using Context-Informed Features 14

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (6) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 15

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? 1 P ( f J 1 | e I 1 ) .P ( e I arg max I,e I 1 ) P ( e I 1 | f J arg max 1 ) = (7) P ( f J 1 ) I,e I 1 Exploiting Source Similarity for SMT Using Context-Informed Features 16

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? 1 P ( f J 1 | e I 1 ) .P ( e I 1 ) .P ( f J arg max I,e I 1 ) P ( e I 1 | f J arg max 1 ) = (8) P ( f J 1 ) I,e I 1 The outcome of arg max does not change if you add or delete P ( f ) . Exploiting Source Similarity for SMT Using Context-Informed Features 17

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Context-Informed Features: Disambiguation C’` e una partita di baseball oggi ? ⇔ Is there a baseball game today? – Possible translations for partita : partita di calcio ⇔ a soccer game game ` gone e partita ⇔ she has gone partita una partita di Bach ⇔ a partita of Bach – Possible translations for di : una tazza di caff` e ⇔ a cup of coffee of prima di partire ⇔ before coming Examples of ambiguity for the (Italian) word partita , easily solved when considering its context. Exploiting Source Similarity for SMT Using Context-Informed Features 19

TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Context-Informed Features: Disambiguation In standard PB-SMT, disambiguation strongly relies on the target LM. Although the various translation features associated with partita and game , partita and gone , etc., depend on the type of training data used, most LMs may still select the correct translation baseball game as the most probable among all the possible combinations of target words: gone of baseball , game of baseball , baseball partita , baseball game , etc. If nothing else, this solution is more expensive than simply looking at the source context. In particular, using context can help prune weak candidates early, al- lowing more time to be spent on more promising candidates. Exploiting Source Similarity for SMT Using Context-Informed Features 20

Exploiting Source Similarity for SMT Using Context-Informed Features - PowerPoint PPT Presentation

Exploiting Source Similarity for SMT Using Context-Informed Features Nicolas Stroppa ( nstroppa@computing.dcu.ie ) Antal van den Bosch ( Antal.vdnBosch@uvt.nl ) Andy Way ( away@computing.dcu.ie ) TMI, 2007, Sk ovde Stroppa, van den Bosch &

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Introduction to SAT and SMT Solvers Interfacing Yosys and SMT Solvers for BMC and more using

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

POLYMETALLIC PRODUCER CORPORATE PRESENTATION July 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT in Asia Content Teknek and the SMT industry The market Why cleaning is needed

POLYMETALLIC PRODUCER CORPORATE PRESENTATION February 2020 TSX: SMT | NYSE AMERICAN: SMTS |

DIVERSIFIED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

DIVERSIFED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT-LIB for HOL Daniel Kroening Philipp Rmmer Georg Weissenbacher Oxford University Computing

Motivation SMT Theories of Interest History of SMT Eager approach Lazy approach Optimizations

Bubble Sort! Tyralyn Tran What is a bubble sort?!?!?!?!?!?!?!? In a bubble sorting algorithm,

Motes, nesC, and TinyOS Gary Wong December 9, 2003 Introduction System overview Mote

Issue 7, 18 September 2002 COMMENT FROM THE PRINCIPAL Seldom have I seen so much improvement

State of the Municipality Address KOU OUGA GA MU MUNI NICIPAL CIPALITY ITY Gov overn

C ALCIUM O XIDE C ATALYZED S YNTHESIS OF C HALCONE U NDER M ICROWAVE CONDITION By

OverFeat Classification, Localization and Detection using Deep Learning Pierre Sermanet, David

STEELMAKING PROCESS GASES UTILIZATION Marcello Fonseca Innovation & Technology Directorate

SPHERA (High Resolution REAnalysis over Italy): system setup and tests Ines Cerenzia, Tiziana

Exploiting Source Similarity for SMT Using Context-Informed Features - PowerPoint PPT Presentation

Exploiting Source Similarity for SMT Using Context-Informed Features Nicolas Stroppa ( nstroppa@computing.dcu.ie ) Antal van den Bosch ( Antal.vdnBosch@uvt.nl ) Andy Way ( away@computing.dcu.ie ) TMI, 2007, Sk ovde Stroppa, van den Bosch &

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 &amp; angr

Semantic Similarity MultiJEDI ERC 259234 Semantic Similarity Semantic Similarity Mostly

SMT WORLDWIDE SMT America, Europe and Asia staff has over 20 years experience in the SMT field

POLYMETALLIC PRODUCER AGM PRESENTATION June 30, 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL: SMT

SMT Solvers: A Disruptive Technology John Rushby Computer Science Laboratory SRI International

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Introduction to SAT and SMT Solvers Interfacing Yosys and SMT Solvers for BMC and more using

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity Semantic

POLYMETALLIC PRODUCER CORPORATE PRESENTATION July 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT in Asia Content Teknek and the SMT industry The market Why cleaning is needed

POLYMETALLIC PRODUCER CORPORATE PRESENTATION February 2020 TSX: SMT | NYSE AMERICAN: SMTS |

DIVERSIFIED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

DIVERSIFED PRODUCER CORPORATE PRESENTATION August 2020 TSX: SMT | NYSE AMERICAN: SMTS | BVL:

SMT-LIB for HOL Daniel Kroening Philipp Rmmer Georg Weissenbacher Oxford University Computing

Motivation SMT Theories of Interest History of SMT Eager approach Lazy approach Optimizations

Bubble Sort! Tyralyn Tran What is a bubble sort?!?!?!?!?!?!?!? In a bubble sorting algorithm,

Motes, nesC, and TinyOS Gary Wong December 9, 2003 Introduction System overview Mote

Issue 7, 18 September 2002 COMMENT FROM THE PRINCIPAL Seldom have I seen so much improvement

State of the Municipality Address KOU OUGA GA MU MUNI NICIPAL CIPALITY ITY Gov overn

C ALCIUM O XIDE C ATALYZED S YNTHESIS OF C HALCONE U NDER M ICROWAVE CONDITION By

OverFeat Classification, Localization and Detection using Deep Learning Pierre Sermanet, David

STEELMAKING PROCESS GASES UTILIZATION Marcello Fonseca Innovation &amp; Technology Directorate

SPHERA (High Resolution REAnalysis over Italy): system setup and tests Ines Cerenzia, Tiziana

Using SMT solvers for binary analysis and exploitation A primer on SMT, SMT solvers, Z3 & angr

STEELMAKING PROCESS GASES UTILIZATION Marcello Fonseca Innovation & Technology Directorate