incl inclusi usive des ve design ign

Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning - PowerPoint PPT Presentation

Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning ning on on Aud Audio in Azu io in Azure re Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist @swethaMVNV Tru rue life e life-thr

  1. Incl Inclusi usive Des ve Design ign Dee Deep Lear p Learning ning on on Aud Audio in Azu io in Azure re Swetha Machanavajhala, Software Engineer II Xiaoyong Zhu, Senior Data Scientist @swethaMVNV

  2. Tru rue life e life-thr threatenin eatening g in incid cident ent Carbon monoxide detector was beeping for a WE WEEK K at Swetha’s house. Since she is deaf, she was unaware until a neighbor informed her. Swetha Machanavajhala


  4. Incl Inclusi usive Des ve Design ign

  5. Visualizing Sounds React in a second

  6. Capturing loudness real-time

  7. Currently…

  8. Currently…

  9. Hearing AI can transcribe phone calls

  10. Hearing AI can do more…

  11. Deep Learning for Audio in Azure Xiaoyong Zhu

  12. Landscape Sound based predictive maintenance

  13. Landscape SDK and product to turn machine sounds to actions

  14. Landscape enables OEMs to embed contextual awareness onto devices.

  15. Dataset

  16. Convert 1-dimensional array to 2-dimensional matrix

  17. CNN on audio Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

  18. CNN on audio Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

  19. CNN on audio Convert 1-dimensional data to 2-dimensional data Input for CNNs – Log-scaled mel-spectrogram mel-spectrogram = what humans hear Consider time by mel-spectrogram bands (X by Y axis) 150 bands each pixel = amplitude in time/frequency combination

  20. Selecting a right band number is important

  21. Network Architecture 11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

  22. Network Architecture 11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

  23. Network Architecture 11 convolutional layers Faster convergence & better accuracy Batch-normalization layers Improving accuracy Self-attention mechanism Longer segments (2secs) Narrow mel-band

  24. Intelligent Sound Prediction - Architecture

  25. Demo Hearing AI can recognize sounds

  26. Performance

  27. Sound Movement Localization Speech Specific sounds Phone calls Artificial Intelligence proves sounds need not be heard!


More recommend