e9 205 machine learning for signal procesing
play

E9 205 Machine Learning for Signal Procesing Deep Learning for Audio - PowerPoint PPT Presentation

E9 205 Machine Learning for Signal Procesing Deep Learning for Audio and Vision 20-11-2019 Speech Recognition Noise Channel Automatic Speech Systems Courtesy Google Images Signal Modeling Short-term spectra integrated in mel frequency


  1. E9 205 Machine Learning for Signal Procesing Deep Learning for Audio and Vision 20-11-2019

  2. Speech Recognition Noise Channel Automatic Speech Systems Courtesy – Google Images

  3. Signal Modeling Short-term spectra integrated in mel frequency bands followed by log ▪ compression + DCT – mel frequency cepstral coefficients (MFCC) [Davis and Mermelstein, 1979]. Short-term Spectrum Integration + Log 25ms + DCT

  4. Mel Frequency Cepstral Coefficients MFCC processing repeated for every short-term frame yielding a ▪ sequence of features. Typically 25ms frames with 10ms hop in time.

  5. Speech Recognition • Map the features to phone class. Using phone labelled data. w - |^| n Triphone /w/ /^/ /n/ Classes • Classical machine learning - train a classifier on speech training data that maps to the target phoneme class.

  6. Back to Speech Recognition Mapping Speech Features to Phonemes

  7. Back to Speech Recognition Mapping Speech Features to Phonemes to words Language Model Decoded [Dictionary of Words Text Pronunciation Model Word Syntax]

  8. State of Progress 2018 5.3% Claims of human parity using BLSTM based Models !!!

  9. Moving to End-to-End Text Output Audio Features

  10. Image Processing

  11. Visual Graphics Group Network

  12. ImageNet Task 1000 images in each of 1000 categories. In all, there are roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images. ImageNet consists of variable-resolution images. Therefore, the images have been down-sampled to a fixed resolution of 224 × 224.

  13. Can we go deeper

  14. Residual Blocks

  15. Deep Networks with Residual Blocks

  16. Deep Networks with Residual Blocks

  17. Results with ResNet

  18. Image Segmentation

  19. The Problem of Segmentation

  20. SegNet Architecture

  21. Results from Segnet

  22. U-net

  23. Summary of the Course

  24. Distribution Pie Chart Generative Modeling and Dimensionality Reduction 45% 55% Discriminative Modeling

  25. Generative Modeling and Dimensionality Reduction Feature Processing PCA/LDA Gaussian and GMM NMF Linear and Logistic Regression 15% 15% kernel methods 15% 15% 8% 31%

  26. Discriminative Modeling SVM Neural Networks Improving Learning Improving Generalization Deep Networks 11% 17% Conv. Networks RNNs Understanding DNNs 17% Deep Generative Modeling 11% Applications 6% 11% 6% 6% 6% 11%

  27. When we started …

  28. Dates of Various Rituals ❖ 5 Assignments spread over 3 months (roughly one assignment every two weeks). ❖ September 1st week - project topic announcements. ❖ September 3rd week - 1st Midterm ❖ September 4th week - project topic and team finalization and proposal submission. [1 and 2 person teams]. ❖ October 1st week - Project Proposal ❖ October 3rd week - 2nd MidTerm ❖ November 1st week - Project MidTerm Presentations. ❖ December 1st week - Final Exams ❖ December 2nd week - Project Final Presentations.

  29. Content Delivery In Class Beyond Class Theory Intuition and and Mathematical Analysis Foundation Implementation and Understanding

Recommend


More recommend