faster decoding for phrases and syntax
play

Faster Decoding for Phrases and Syntax Kenneth Heafield Translation - PowerPoint PPT Presentation

Faster Decoding for Phrases and Syntax Kenneth Heafield Translation is Expensive speed-up in tuning time but a ff ects the performance 18 days using 12 cores [Williams et al WMT 2014] Time-sensitive BLEU score [Chung and


  1. Faster Decoding for Phrases and Syntax Kenneth Heafield

  2. Translation is Expensive “speed-up in tuning time but a ff ects the performance” “18 days using 12 cores” [Williams et al WMT 2014] “Time-sensitive BLEU score” [Chung and Galley, 2012] “Due to time constraints, this procedure was not used” [Servan et al, WMT 2012] ) Routine Quality Compromises = Introduction Problem Cube Pruning Incremental Conclusion 2

  3. Introduction Problem Cube Pruning Incremental Conclusion 3

  4. Blame the Language Model “LM queries often account for more than 50% of the CPU” [Green et al, WMT 2014] Introduction Problem Cube Pruning Incremental Conclusion 4

  5. Blame the Language Model “LM queries often account for more than 50% of the CPU” [Green et al, WMT 2014] Faster queries (KenLM) More e ff ective queries Introduction Problem Cube Pruning Incremental Conclusion 5

  6. ������������ � ���������������������������������������������� �� � ����������������������������������������������� ���������������������������������������� � ���������������������������������������� ���������������������������������������������� ������������ � ������������������������������������������� ������������������������������������������

  7. ������������ � ���������������������������������������������� �� � ����������������������������������������������� ���������������������������������������� � ���������������������������������������� ���������������������������������������������� ������������ � ������������������������������������������� ������������������������������������������

  8. 1 Decoding problem 2 Cube pruning 3 Incremental Introduction Problem Cube Pruning Incremental Conclusion 8

  9. Decoding Example: Input a vu Le gar¸ con l’homme avec un t´ elescope Introduction Problem Cube Pruning Incremental Conclusion 9

  10. Decoding Example: Parse with SCFG S : S X : NP X : VP X : VP X : V X : NP X : PP a vu Le gar¸ con l’homme avec un t´ elescope Introduction Problem Cube Pruning Incremental Conclusion 10

  11. Decoding Example: Read Target Side S : S X : NP X : VP X : VP X : V X : NP X : PP a vu Le gar¸ con l’homme avec un t´ elescope seen The boy man with the telescope saw A boy the man to an telescope view some men with a telescope Introduction Problem Cube Pruning Incremental Conclusion 11

  12. Decoding Example: One Constituent S : S X : NP X : VP X : VP X : V X : NP X : PP a vu Le gar¸ con l’homme avec un t´ elescope seen The boy man with the telescope saw A boy the man to an telescope view some men with a telescope Introduction Problem Cube Pruning Incremental Conclusion 12

  13. X : VP X : V X : NP a vu l’homme Hyp Hyp seen man saw the man view some men Introduction Problem Cube Pruning Incremental Conclusion 13

  14. X : VP X : VP a vu l’homme X : V X : NP Hypothesis seen man seen the man a vu l’homme seen some men Hyp Hyp saw man seen man saw the man saw the man saw some men view some men view man view the man view some men Introduction Problem Cube Pruning Incremental Conclusion 14

  15. X : VP X : VP a vu l’homme X : V X : NP Hypothesis Score seen man - 8 . 8 seen the man - 7 . 6 a vu l’homme seen some men - 9 . 5 Hyp Score Hyp Score saw man - 8 . 3 seen - 3 . 8 man - 3 . 6 saw the man - 6 . 9 saw - 4 . 0 the man - 4 . 3 saw some men - 8 . 5 view - 4 . 0 some men - 6 . 3 view man - 8 . 5 view the man - 8 . 9 view some men - 10 . 8 Introduction Problem Cube Pruning Incremental Conclusion 15

  16. X : VP X : VP a vu l’homme X : V X : NP Hypothesis Score saw the man - 6 . 9 seen the man - 7 . 6 a vu l’homme saw man - 8 . 3 Hyp Score Hyp Score saw some men - 8 . 5 seen - 3 . 8 man - 3 . 6 view man - 8 . 5 saw - 4 . 0 the man - 4 . 3 seen man - 8 . 8 view - 4 . 0 some men - 6 . 3 view the man - 8 . 9 seen some men - 9 . 5 view some men - 10 . 8 Introduction Problem Cube Pruning Incremental Conclusion 16

  17. X : VP X : VP a vu l’homme X : V X : NP Hypothesis Score saw the man - 6 . 9 seen the man - 7 . 6 a vu l’homme saw man - 8 . 3 Hyp Score Hyp Score saw some men - 8 . 5 seen - 3 . 8 man - 3 . 6 view man - 8 . 5 saw - 4 . 0 the man - 4 . 3 seen man - 8 . 8 view - 4 . 0 some men - 6 . 3 view the man - 8 . 9 seen some men - 9 . 5 view some men - 10 . 8 Scores do not sum Introduction Problem Cube Pruning Incremental Conclusion 17

  18. X : VP X : VP a vu l’homme X : V X : NP Hypothesis Score saw the man - 6 . 9 seen the man - 7 . 6 a vu l’homme saw man - 8 . 3 Hyp Score Hyp Score saw some men - 8 . 5 seen - 3 . 8 man - 3 . 6 view man - 8 . 5 saw - 4 . 0 the man - 4 . 3 seen man - 8 . 8 view - 4 . 0 some men - 6 . 3 view the man - 8 . 9 seen some men - 9 . 5 view some men - 10 . 8 Pruning is Approximate Introduction Problem Cube Pruning Incremental Conclusion 18

  19. Appending Strings Hypotheses are built by string concatenation. Language model probability changes when this is done: p ( saw the man ) = p ( the | saw ) p ( man | saw the ) p ( saw ) p ( the man ) p ( the ) p ( man | the ) Introduction Problem Cube Pruning Incremental Conclusion 19

  20. Appending Strings Hypotheses are built by string concatenation. Language model probability changes when this is done: p ( saw the man ) = p ( the | saw ) p ( man | saw the ) p ( saw ) p ( the man ) p ( the ) p ( man | the ) Log probability is part of the score = ) Scores do not sum ) Local decisions may not be globally optimal = = ) Search is hard. Introduction Problem Cube Pruning Incremental Conclusion 20

  21. 1 Decoding problem 2 Cube pruning 3 Incremental Introduction Problem Cube Pruning Incremental Conclusion 21

  22. Beam Search man � 3.6 the man � 4.3 some men � 6.3 seen man � 8.8 seen the man � 7.6 seen some men � 9.5 seen � 3.8 saw � 4.0 saw man � 8.3 saw the man � 6.9 saw some men � 8.5 view man � 8.5 view the man � 8.9 view some men � 10.8 view � 4.0 [Lowerre, 1976; Chiang, 2005] Introduction Problem Cube Pruning Incremental Conclusion 22

  23. Cube Pruning man � 3.6 the man � 4.3 some men � 6.3 Queue seen � 3.8 saw � 4.0 view � 4.0 Queue Hypothesis Sum seen man � 3.8 � 3.6 = � 7.4 [Chiang, 2007] Introduction Problem Cube Pruning Incremental Conclusion 23

  24. Cube Pruning man � 3.6 the man � 4.3 some men � 6.3 seen man � 8.8 Queue seen � 3.8 saw � 4.0 Queue view � 4.0 Queue Hypothesis Sum saw man � 4.0 � 3.6 = � 7.6 seen the man � 3.8 � 4.3 = � 8.1 [Chiang, 2007] Introduction Problem Cube Pruning Incremental Conclusion 24

  25. Cube Pruning man � 3.6 the man � 4.3 some men � 6.3 seen man � 8.8 Queue seen � 3.8 saw � 4.0 saw man � 8.3 Queue Queue view � 4.0 Queue Hypothesis Sum view man � 4.0 � 3.6 = � 7.6 seen the man � 3.8 � 4.3 = � 8.1 saw the man � 4.0 � 4.3 = � 8.3 [Chiang, 2007] Introduction Problem Cube Pruning Incremental Conclusion 25

  26. Cube Pruning man � 3.6 the man � 4.3 some men � 6.3 seen man � 8.8 Queue seen � 3.8 saw � 4.0 saw man � 8.3 Queue view man � 8.5 Queue view � 4.0 Queue Hypothesis Sum seen the man � 3.8 � 4.3 = � 8.1 saw the man � 4.0 � 4.3 = � 8.3 view the man � 4.0 � 4.3 = � 8.3 [Chiang, 2007] Introduction Problem Cube Pruning Incremental Conclusion 26

  27. Beam Search Make every dish. Keep the best k , throw the rest out. Cube pruning Combine the best ingredients. Only make k dishes. Introduction Problem Cube Pruning Incremental Conclusion 27

  28. Cube Pruning Hypotheses are Atomic String String is a countries that String countries that are a countries that is a countries which are a countries which are a country . . . No notion that “a countries” is bad. Introduction Problem Cube Pruning Incremental Conclusion 28

Recommend


More recommend