exploiting source similarity for smt using context
play

Exploiting Source Similarity for SMT Using Context-Informed Features - PowerPoint PPT Presentation

Exploiting Source Similarity for SMT Using Context-Informed Features Nicolas Stroppa ( nstroppa@computing.dcu.ie ) Antal van den Bosch ( Antal.vdnBosch@uvt.nl ) Andy Way ( away@computing.dcu.ie ) TMI, 2007, Sk ovde Stroppa, van den Bosch &


  1. Exploiting Source Similarity for SMT Using Context-Informed Features Nicolas Stroppa ( nstroppa@computing.dcu.ie ) Antal van den Bosch ( Antal.vdnBosch@uvt.nl ) Andy Way ( away@computing.dcu.ie )

  2. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Overview 1. Motivation 2. The Standard Approach 3. Context-Informed Features 4. Memory-Based Disambiguation 5. An Example 6. Evaluation & Results 7. Related Work 8. Conclusions 9. Future Work Exploiting Source Similarity for SMT Using Context-Informed Features 2

  3. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation • SMT is target-similarity-based ; • EBMT is source-similarity-based . Can we exploit both benefits in one model? Exploiting Source Similarity for SMT Using Context-Informed Features 3

  4. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation: SMT is target-similarity-based The probability of a target sentence w.r.t. an n -gram-based LM can be seen as a measure of similarity between this sentence and those sen- tences found in the training corpus C . The LM will assign high probabilities to those sentences that share lots of n -grams with the sentences in C , while sentences with few n -gram matches will receive low probabilities. ⇒ the LM is used to make the resulting translation as similar as pos- sible to previously seen target sentences. Exploiting Source Similarity for SMT Using Context-Informed Features 4

  5. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation: EBMT is source-similarity-based There are 3 processing stages in EBMT: 1. retrieving ‘similar’ fragments of the input string against the reference corpus; 2. identifying the corresponding translation fragments; 3. recombining these translation fragments into the appropriate target text. Depending on the exact EBMT method used, different notions of ‘simi- larity’ are employed. However, all models of EBMT rely on the retrieval of source sentences similar to the new input string in the training material. Exploiting Source Similarity for SMT Using Context-Informed Features 5

  6. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Motivation: Benefits of a Combined Model • Source similarity may limit ambiguity problems; • Target similarity may avoid problems such as boundary friction . By exploiting the two types of similarity, we might benefit from the strengths of both aspects. Exploiting Source Similarity for SMT Using Context-Informed Features 6

  7. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Overview 1. Motivation 2. The Standard Approach 3. Context-Informed Features 4. Memory-Based Disambiguation 5. An Example 6. Evaluation & Results 7. Related Work 8. Conclusions 9. Future Work Exploiting Source Similarity for SMT Using Context-Informed Features 7

  8. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Phrase-Based SMT In SMT, translation is modeled as a decision process, in which the trans- lation e I 1 = e 1 . . . e i . . . e I of a source sentence f J 1 = f 1 . . . f j . . . f J is chosen to maximize: P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (1) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 8

  9. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Translation Model P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (2) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 9

  10. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Language Model P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (3) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 10

  11. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT In log-linear phrase-based SMT, the posterior probability P ( e I 1 | f J 1 ) is directly modeled as a (log-linear) combination of features [Och & Ney, ACL-02], that usually comprise M translational features (e.g. sentence length, lexical features, grammatical dependencies), and the language model: m log P ( e I 1 | f J λ m h m ( f J 1 , e I 1 , s K 1 ) + λ LM log P ( e I � 1 ) = 1 ) (4) m =1 where s K 1 = s 1 . . . s k denotes a segmentation of the source and target sentences respectively into the sequences of phrases ( ˜ f 1 , . . . , ˜ f k ) and ( ˜ e 1 , . . . , ˜ e k ) . Exploiting Source Similarity for SMT Using Context-Informed Features 11

  12. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Each feature h m in log-linear PB-SMT can be rewritten as: K h m ( f J 1 , e I 1 , s K ˜ h m ( ˜ � 1 ) = f k , ˜ e k , s k ) , (5) k =1 where ˜ h m is a feature that applies to a single phrase-pair. That is, while the features in log-linear PB-SMT can apply to entire sen- tences in theory, in practice, those features apply to single phrase pairs (in existing models). Remarkably, then, the usual translational features involved in those mod- els only depend on an individual pair of source/target phrases, i.e. they do not take into account the contexts of those phrases. Exploiting Source Similarity for SMT Using Context-Informed Features 12

  13. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT In this context, the translation process amounts to: • choosing a segmentation of the source sentence, • translating each source phrase, and possibly • re-ordering the target segments obtained. But translational choices are strongly driven by the target LM. Instead, we will try to use the source context to resolve ambiguities ... Exploiting Source Similarity for SMT Using Context-Informed Features 13

  14. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? Exploiting Source Similarity for SMT Using Context-Informed Features 14

  15. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? P ( e I 1 | f J P ( f J 1 | e I 1 ) .P ( e I arg max 1 ) = arg max 1 ) (6) I,e I I,e I 1 1 Exploiting Source Similarity for SMT Using Context-Informed Features 15

  16. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? 1 P ( f J 1 | e I 1 ) .P ( e I arg max I,e I 1 ) P ( e I 1 | f J arg max 1 ) = (7) P ( f J 1 ) I,e I 1 Exploiting Source Similarity for SMT Using Context-Informed Features 16

  17. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg The Standard Approach: Log-linear phrase-based SMT Why do we need to try to integrate source language context? Why can’t we just add an LM for the source language? 1 P ( f J 1 | e I 1 ) .P ( e I 1 ) .P ( f J arg max I,e I 1 ) P ( e I 1 | f J arg max 1 ) = (8) P ( f J 1 ) I,e I 1 The outcome of arg max does not change if you add or delete P ( f ) . Exploiting Source Similarity for SMT Using Context-Informed Features 17

  18. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Overview 1. Motivation 2. The Standard Approach 3. Context-Informed Features 4. Memory-Based Disambiguation 5. An Example 6. Evaluation & Results 7. Related Work 8. Conclusions 9. Future Work Exploiting Source Similarity for SMT Using Context-Informed Features 18

  19. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Context-Informed Features: Disambiguation C’` e una partita di baseball oggi ? ⇔ Is there a baseball game today? – Possible translations for partita : partita di calcio ⇔ a soccer game game ` gone e partita ⇔ she has gone partita una partita di Bach ⇔ a partita of Bach – Possible translations for di : una tazza di caff` e ⇔ a cup of coffee of prima di partire ⇔ before coming Examples of ambiguity for the (Italian) word partita , easily solved when considering its context. Exploiting Source Similarity for SMT Using Context-Informed Features 19

  20. TMI, 2007, Sk¨ ovde Stroppa, van den Bosch & Way: DCU & Tilburg Context-Informed Features: Disambiguation In standard PB-SMT, disambiguation strongly relies on the target LM. Although the various translation features associated with partita and game , partita and gone , etc., depend on the type of training data used, most LMs may still select the correct translation baseball game as the most probable among all the possible combinations of target words: gone of baseball , game of baseball , baseball partita , baseball game , etc. If nothing else, this solution is more expensive than simply looking at the source context. In particular, using context can help prune weak candidates early, al- lowing more time to be spent on more promising candidates. Exploiting Source Similarity for SMT Using Context-Informed Features 20

Recommend


More recommend