new territory of machine translation
play

New Territory of Machine Translation Kyunghyun Cho Courant - PowerPoint PPT Presentation

New Territory of Machine Translation Kyunghyun Cho Courant Institute of Mathematical Sciences & Center for Data Science, New York University I really enjoyed this film. However, that is on the basis that Science Fiction is one of my


  1. New Territory of Machine Translation Kyunghyun Cho Courant Institute of Mathematical Sciences & Center for Data Science, New York University

  2. I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you. http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv I really enjoyed this film. Word segmentation, tokenization, … (I, really, enjoyed, this, film,.) Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez Machine Translation vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, Detokenization, … alors ce pourrait être pour vous. Google Translate (J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

  3. I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you. http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv However, that is on the basis that Science Fiction is one of my favourite genres: Word segmentation, tokenization, … (However, ,, that, is, on, the, basis, that, Science, Fiction, is, one, of, my, favourite, genres, :) Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez Machine Translation vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, Detokenization, … alors ce pourrait être pour vous. Google Translate (Cependant, ,, ce, qui, est, sur, la, Cependant, ce qui est sur la base que la science-fiction base, que, la, science-fiction, est un de mes genres préférés: est, un, de, mes, genres, préférés, :)

  4. Do you see three issues here?

  5. I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you. http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv I really enjoyed this film. Word segmentation, tokenization, … (I, really, enjoyed, this, film,.) n Je vraiment aimé ce film. Cependant, ce o i t a l s n qui est sur la base que la science-fiction a r T e s est un de mes genres préférés: je peux i w - e c voir certains publics trouver le tracé n e t n e philosophique trop lent et verbeux pour S maintenir leur intérêt. Mais si vous aimez Machine Translation vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, Detokenization, … alors ce pourrait être pour vous. Google Translate (J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

  6. I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you. http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv I really enjoyed this film. Word segmentation, tokenization, … Word-level Translation (I, really, enjoyed, this, film,.) Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez Machine Translation vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, Detokenization, … alors ce pourrait être pour vous. Google Translate (J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

  7. I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you. http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv I really enjoyed this film. Bilingual Translation Word segmentation, tokenization, … (I, really, enjoyed, this, film,.) Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez Machine Translation vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, Detokenization, … alors ce pourrait être pour vous. Google Translate (J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

  8. Word-level Sentence-wise Bilingual Translation

  9. I really enjoyed this film. However, that is on the basis that Science Fiction is one of my favourite genres: I can see some audiences finding the philosophical plotting too slow and wordy to hold their interest. But if you like your films deep and thought-provoking, as well as deliciously tense in places, then this might be for you. http://www.imdb.com/title/tt0470752/reviews?ref_=tt_urv I really enjoyed this film. Word segmentation, tokenization, … (I, really, enjoyed, this, film,.) Je vraiment aimé ce film. Cependant, ce qui est sur la base que la science-fiction est un de mes genres préférés: je peux voir certains publics trouver le tracé philosophique trop lent et verbeux pour maintenir leur intérêt. Mais si vous aimez Machine Translation vos films et profonde réflexion, ainsi que délicieusement tendue dans les lieux, Detokenization, … alors ce pourrait être pour vous. Google Translate (J’, ai, vraiment, aimé, ce, film, .) J'ai vraiment aimé ce film.

  10. Neural Machine Translation • Input : a source sentence 
 h s i X = ( x 1 , x 2 , . . . , x T x ) y 1 y 2 y 3 y 4 • Output : a target sentence 
 z 1 z 2 z 4 z 3 z 0 Y = ( y 1 , y 2 , . . . , y T y ) • Data : Parallel corpus of sentences pairs 
 { ( X 1 , Y 1 ) , ( X 2 , Y 2 ) , . . . , ( X N , Y N ) } h 1 h 2 h 3 h 4 0 = c • Goal : maximize the log-likelihood T y,n N x 3 x 1 x 2 x 4 1 X X log p ( y n t | y n <t , X n ) N n =1 t =1

  11. Neural Machine Translation • Input and Output : sequences of one-hot vectors h s i y 1 y 2 y 3 y 4 • One-hot vectors for words > • Example) “cat”   0 , z 1 z 2 z 4 z 3 z 0 . . ID Word   . ,     1 the 0 ,   2093 -th   2 a 1 , e cat =   element h 1 h 2 h 3 h 4 0   = c 0 ,     . 2093 cat .   . ,   0 x 3 x 1 x 2 x 4 • No prior encoded • Permutation invariant

  12. 
 Neural Machine Translation 1. Encode the source sentence into a vector 
 ⇢ φ enc ( h t − 1 , x t ) , h s i y 1 y 2 y 3 y 4 if t > 0 h t = 0 , otherwise 2. Initialize the decoder based on 
 c = h T x z 1 z 2 z 4 z 3 z 0 z 0 = f init ( c ) 3. Update the decoder conditioned on 
 c h 1 h 2 h 3 h 4 0 = c z t = φ dec ( z t − 1 , c, y t − 1 ) 4. Compute the target word distribution 
 x 3 x 1 x 2 x 4 p ( y t | y <t , X ) ∝ exp( φ out ( z t )) y t = h eos i 5. Sample and go back to 3 unless y t

  13. Neural Machine Translation This is not too great a model, because h s i y 1 y 2 y 3 y 4 “You can’t cram the meaning of a whole %&!$# sentence into a single $&!#* z 1 z 2 z 4 z 3 z 0 vector!” Ray Mooney h 1 h 2 h 3 h 4 0 = c x 3 x 1 x 2 x 4

  14. Attention-based Neural MT h 2 h 3 h 4 = c h 1 • Encoder: Bidirectional RNN ← − ← − ← − ← − 0 h 1 h 2 h 3 h 4 • Context-Dependent Word Vectors − → → − − → → − 0 h 1 h 2 h 3 h 4 • Disambiguation of words’ meaning x 1 x 2 x 4 x 3 • I have a car vs. I have been a researcher • Distinguishing multiple occurrences of a single word • A black cat is chasing a white cat together with a brown cat .

  15. Attention-based Neural MT • Attention Mechanism h s i y 1 y 2 • How relevant is given ? h j z t − 1 f score z 1 z 2 z 0 e 2 , 3 • : what has been translated so far z t − 1 h 2 h 3 h 4 h 1 = c • Normalized to sum to 1 ← − ← − ← − ← − 0 h 1 h 2 h 3 h 4 exp( e j,t ) − → − → − → − → 0 h 1 h 2 α j,t = h 3 h 4 P j 0 exp( e j 0 ,t ) x 1 x 2 x 3 x 4

  16. Attention-based Neural MT h s i y 1 y 2 • Decoder z 3 • Dynamic context z 1 z 2 z 0 c t X c t = α j,t h j + α 3 , 3 α 4 , 3 α 1 , 3 α 2 , 3 j • As usual with the simple decoder h 2 h 3 h 4 h 1 = c ← − ← − ← − ← − z t = φ dec ( z t − 1 , c t , y t − 1 ) 0 h 1 h 2 h 3 h 4 p ( y t | y <t , X ) ∝ exp( φ out ( z t )) − → → − − → − → 0 h 1 h 2 h 3 h 4 x 1 x 2 x 3 x 4

Recommend


More recommend