deep learning theory and practice
play

Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural - PowerPoint PPT Presentation

Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural Networks Introduction The standard DNN/CNN paradigms (x,y) - ordered pair of data vectors/images (x) and target (y) Moving to sequence data (x(t),y(t)) where this


  1. Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural Networks

  2. Introduction ❖ The standard DNN/CNN paradigms ❖ (x,y) - ordered pair of data vectors/images (x) and target (y) ❖ Moving to sequence data ❖ (x(t),y(t)) where this could be sequence to sequence mapping task. ❖ (x(t),y) where this could be a sequence to vector mapping task.

  3. Introduction ❖ Difference between CNNs/DNNs ❖ (x(t),y(t)) where this could be sequence to sequence mapping task. ❖ Input features / output targets are correlated in time. ❖ Unlike standard models where each pair is independent. ❖ Need to model dependencies in the sequence over time.

  4. Introduction to Recurrent Networks “Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

  5. Recurrent Networks “Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

  6. Recurrent Networks “Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

  7. Back Propagation in RNNs Model Parameters Gradient Descent

  8. Recurrent Networks

  9. Back Propagation Through Time

  10. Back Propagation Through Time

  11. Standard Recurrent Networks “Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

  12. Other Recurrent Networks Teacher Forcing Networks “Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

  13. Recurrent Networks Teacher Forcing Networks “Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

  14. Recurrent Networks Multiple Input Single Output

  15. Recurrent Networks Single Input Multiple Output

  16. Recurrent Networks Bi-directional Networks

  17. Recurrent Networks Sequence to Sequence Mapping Networks

  18. Long-term Dependency Issues

  19. Vanishing/Exploding Gradients ❖ Gradients either vanish or explode ❖ Initial frames may not contribute to gradient computations or may contribute too much.

  20. Long-Short Term Memory

  21. LSTM Cell Input Gate f - sigmoid function g, h - tanh function Forget Gate Cell Output Gate LSTM output

  22. Long Short Term Memory Networks

  23. Gated Recurrent Units (GRU)

  24. Attention in LSTM Networks ❖ Attentions allows a mechanism to add relevance ❖ Certain regions of the audio have more importance than the rest for the task at hand.

  25. Encoder - Decoder Networks with Attention

  26. Attention Models

  27. Attention - Speech Example From our lab [part of ICASSP 2019 paper].

  28. Language Recognition Evaluation

  29. End-to-end model using GRUs and Attention

  30. Proposed End-to-End Language Recognition Model

  31. Proposed End-to-End Language Recognition Model

  32. Proposed End-to-End Language Recognition Model

  33. Language Recognition Evaluation State-of-art models use the input sequence directly. We proposed the attention model - Attention weighs th importance of each short-term segment feature for the task. 0-3s : O...One muscle at all, it was terrible Attention Weight 3s-4s : .... ah .... ah .... 4s - 9s : I couldn't scream, I couldn't shout, I couldn't even move my arms up, or my legs 9s -11s : I was trying me hardest, I was really really panicking. Bharat Padi, et al. “End-to-end language recognition using hierarchical gated recurrent networks”, under review 2018.

  34. Language Recognition Evaluation

  35. Language Recognition Evaluation

Recommend


More recommend