how deep learning is making mt and other areas converge
play

How Deep Learning is making MT and other areas converge? MARTA R. - PowerPoint PPT Presentation

How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSS UNIVERSITAT POLITCNICA DE CATALUNYA, BARCELONA About me ASR SMT HMT CLIR HMT SMT+NN CLIR OM LIMSI- USP, So I2R, IPN,


  1. How Deep Learning is making MT and other areas converge? MARTA R. COSTA-JUSSÀ UNIVERSITAT POLITÈCNICA DE CATALUNYA, BARCELONA

  2. About me • ASR • SMT • HMT • CLIR • HMT • SMT+NN • CLIR • OM LIMSI- USP, São I2R, IPN, BM, CNRS, Paris Paulo Singapore Mexico Barcelona 2004 2008 2012 2014 2015 • SMT • NMT • S2S • NLI Translation • SLT UPC, UPC, Barcelona Barcelona 2

  3. Outline Machine Translation and Deep Learning Neural Machine Translation Neural MT architecture applied to other areas ◦ NLP (Chatbot) ◦ Speech (End-to-End speech recognition, End-to-End speech translation) ◦ Image (Image captioning) Neural MT inspired by other areas ◦ Image/NLP (Character-aware modelling) ◦ Machine Learning (Adversarial networks) Discussion 3

  4. Machine Translation SOURCE LANGUAGE M Co-ocurrences O Rules Frecuency Neural Networks D Dictionaries Counts E L TARGET LANGUAGE From 1950s till now From 1990s till now Starting in 2014… Dates Eurotra, Apertium… TC-Star, Moses… NEMATUS… Refs (Forcada, 2005) (Koehn, 2010) (Cho, 2014) 4

  5. Neural nets are… Neural networks, a branch of machine learning, are a biologically-inspired programming paradigm which enables a computer to learn from observational data (http://neuralnetworksanddeeplearning.com/) 5

  6. Deep learning is… A branch of machine learning based on a set of algorithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations (wikipedia) A set of machine learning algorithms which attempt to learn multiple-layered models of inputs, commonly neural networks (Du et al, 2013) 6

  7. Neural Machine Translation 7

  8. Motivation: End-to-end system PHRASE-BASED NEURAL decoder Source Language Text Parallel corpus preprocessing Word Alignment Preprocessing Phrase extraction Finding the right target words Translation model given the source words Decoding Language model Ensure that translated words come in the right order Postprocessing Monolingual corpus TEST TRAINING encoder Target Language Text 8

  9. Related work: language modeling Find a function that takes as input n-1 words and returns a conditional probability of the next one Recurrent neural network have allowed to ignore dependency beyond the context window (via recursion) p(I’m) p(fine|I’m) p(.|fine) EOS . I’m fine 9

  10. Architecture: encoder-decoder decoder encoder ? estás EOS Cómo you how are estás ? eos Cómo ? 10

  11. Attention-based mechanism decoder + encoder 11

  12. Neural MT architecture applied to other areas NATURAL LANGUAGE PROCESSING SPEECH IMAGE

  13. Natural Language Processing 13

  14. Chatbot: a computer program that conducts a conversation 14

  15. Chatbots models RULE-BASED RETRIEVAL-BASED ELIZA: Decomposition rules and key words IRIS: a chat-oriented dialogue system based matching (Weizenbaum, 1966) AIML on the vector space model framework (Artificial Intelligence Markup Language) (Banchs and Li, 2012) (Wallace, 1995). <category> <pattern>WHAT IS YOUR NAME</pattern> <template>My name is Michael N.S Evanious.</template> </category> https://hackernoon.com/chatbot-architecture-496f5bf820ed 15

  16. Generative-based chatbots decoder encoder This model can be trained end-to-end Human: what is your job ? Enc-dec: i’m a lawyer . Can not solve the Human: what do you do ? problem of modeling Enc-dec: i’m a doctor . the dialogue 16

  17. Chatbot evaluation Question Count Enc-dec PREFERRED OVER Cleverbot 97 Cleverbot PREFERRED OVER Enc-dec 60 Tie 20 ü The model can generate Disagreemnt AMONG JUDGES (no 23 simple and basic consensus of 3 out of 4 judges) conversations. X The model can not currently deliver realistic conversations. 17

  18. Chatbot examples Human: what is the purpose of being intelligent ? Human: what is your job ? CleverBot: The purpose of life is to love God and love people. CleverBot: I work in mental health Enc-dec: to find out what it is Enc-dec: i ’m a lawyer . * Human: what do you do ? Human: What do you think about england during the reign of CleverBot: I have no job . elizabeth ? Enc-dec: i ’m a doctor . CleverBot: I like him. Do you? Enc-dec: it was a great place . * Human: What is the color of a yellow car ? CleverBot: Brown. Enc-dec: yellow . 18

  19. Other NLP Tasks with Seq2seq Text summarization : process of shortening a text document with software to create a summary with the major points of the original document. Question Answering: automatically producing an answer to a question given a corresponding document. Semantic Parsing: mapping natural language into a logical form that can be executed on a knowledge base and return an answer Syntactic Parsing: process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar 19

  20. Speech Recognition 20

  21. Speech Recognition system x = x 1 ... x | x | w = w 1 ... w | w | TASK Language INFO models Feature N-best microphone FEATURES vector Hip. RECOGNIZER DECISION RECOGNIZED Acoustic Lexicon Models SENTENCE 21

  22. RNN/CNN-HMM+RNNLM Acoustic Model RNN/CNN Phonetic inventory HMM Pronunciation Lexicon Language (N-GRAM +) RNN Model 22

  23. Speech recognition with encoder- decoder with attention decoder Language Model + Acoustic Model encoder 23

  24. Listener Challenges: speech signals can be hundreds to thousands of frames long Solution: using a pyramid BLSTM 24

  25. Attend & Spell 25

  26. End-to-end Sp Speech -to- te text Model WER CLDNN-HMM* 8.0 LAS + LM Rescoring 10.3 *Convolutional Long Short Term Memory Fully Connected Deep Neural Network 26

  27. End-to-end Multi-task learning which aims at improving the Speech-to-text generalization performance of a task using other related Translation tasks. One-to-many Many-to-One What is new here compared to previous work? Speech Speech Multi-task training Recognition Translation English Spanish Text Speech Speech Text Translation Translation One encoder, multiple decoders Multiple encoders, one decoder 27

  28. Spanish-> English FISHER/CALLHOME BLEU results Model Test 1 Test 2 End-to-End ST 47.3 16.6 Multi-task 48.7 17.4 ASR / NMT concatenation 45.4 16.6 28

  29. Example of attention probabilities 29

  30. Image 30

  31. Image Captioning A cat on the mat 31

  32. Encoder-decoder with attention decoder + encoder 32

  33. Captioning: Show, Attend & Tell 33

  34. Results on the MS COCO database Method BLEU Log-Biliniar (Kiros et al 2014a) 24.3 Enc-Dec (Vinyals et al 2014a) 24.6 +Attention (Xu et al, 2015) 25.0 34

  35. Other Computer Vision Tasks with Attention Visual Question Answering : given an image and a natural language question about the image, the task is to provide an accurate natural language answer. Video Caption Generation: attempts to generate a complete and natural sentence, enriching the single label as in video classification, to capture the most informative dynamics in videos. 35

  36. Neural MT architecture inspired by other areas

  37. Convolutional Neural Neworks for character-aware Neural MT 37

  38. German-English BLEU Results Method DE->EN EN->DE Phrase 20.99 17.04 NMT 20.64 17.15 +Char 22.10 20.22 38

  39. Examples 39

  40. Generative Adversarial Networks 40

  41. German-to-English BLEU Results Method DE->EN Baseline (Shen et al 2016) 25.84 +Adversarial 27.94 41

  42. German-to-English Example Source wir mussen verhindern , dass die menschen kenntnis erlangen von dingen , vor allem dann , wenn sie wahr sind . Baseline we need to prevent people who are able to know that people have to do, especially if they are true . +Adversarial we need to prevent people who are able to know about things , especially if they are true . REF we have to prevent people from finding about things , especially when they are true . 42

  43. Discussion 43

  44. Implementations of Encoder-Decoder LSTM CNN 44

  45. Attention-based mechanisms Soft vs Hard: soft attention weights all pixels, hard attention crops the image and forces attention only on the kept part. Global vs Local: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. Intra vs External: intra attention is within the encoder’s input sentence, external attention is across sentences. 45

  46. One large encoder-decoder • Text, speech, image… is all converging to a signal paradigm? • If you know how to build a neural MT system, you may easily learn how to build a speech-to-text recognition system... • Or you may train them together to achieve zero-shot AI. *And other references on this research direction…. 46

  47. Thanks Acknowledgements: MARTA.RUIZ@UPC.EDU Noé Casas and Carlos Escolano • WWW.COSTA-JUSSA.COM for their valuable feedback on the slides. MT-Marathon Organizers for • inviting me to this exciting event.

Recommend


More recommend