voice separation with tiny ml on the edge
play

Voice Separation with tiny ML on the edge Main collaborators: Niels - PowerPoint PPT Presentation

Tiny ML Summit 2020 Voice Separation with tiny ML on the edge Main collaborators: Niels H. Pontoppidan, PhD Dr. Lars Bramslw (Eriksholm Research Centre, Denmark) Research Area Manager, Augmented Hearing Science Prof. Toumas Virtanen


  1. Tiny ML Summit 2020 Voice Separation with tiny ML on the edge Main collaborators: Niels H. Pontoppidan, PhD Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark) Research Area Manager, Augmented Hearing Science Prof. Toumas Virtanen (University of Tampere, Finland) Gaurav Naithani (University of Tampere, Finland)

  2. Additional acknowledgements and references • • Thomas “Tom” Barker Bramsløw, L., Naithani, G., Hafez, A., Barker, T., Pontoppidan, N. H., & Virtanen, T. (2018). Improving competing voices segregation for hearing impaired listeners using a low-latency deep neural network algorithm. The Journal of the Acoustical Society of America , 144 (1), 172 – 185. • Giambattista Parascandolo • Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & Virtanen, T. (2017). Low latency sound source separation using convolutional • Joonas Nikunen recurrent neural networks. 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) , 71 – 75. • Naithani, G., Barker, T., Parascandolo, G., Bramsløw, L., Pontoppidan, N. H., & • Rikke Rossing Virtanen, T. (2017). Evaluation of the benefit of neural network based speech separation algorithms with hearing impaired listeners. Proceedings of the 1st International Conference on Challenges in Hearing Assistive Technology . CHAT-17, • Atefeh Hafez Stockholm, Sweden. • Naithani, G., Parascandolo, G., Barker, T., Pontoppidan, N. H., & Virtanen, T. • Marianna Vatti (2016). Low-latency sound source separation using deep neural networks. 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP) , 272 – 276. • Umaer Hanif • Pontoppidan, N. H., Vatti, M., Rossing, R., Barker, T., & Virtanen, T. (2016). Separating known competing voices for people with hearing loss. Proceedings of the Speech Processing in Realistic Environments Workshop, SPIRE Workshop . • Christian Grant • Barker, Thomas, Virtanen, T., & Pontoppidan, N. H. (2016). Hearing device comprising a low-latency sound source separation unit . • • Christian Hansen Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2014). Hearing device comprising a low-latency sound source separation unit (Patent No. US Patent App. 14/874,641). • Barker, Tom, Virtanen, T., & Pontoppidan, N. H. (2015). Low-latency sound-source- separation using non-negative matrix factorisation with coupled analysis and synthesis dictionaries. Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing , 2015 , 241 – 245.

  3. Facts and stats about hearing aids and market Market • 15+ million units sold per year • Global wholesale market of USD 4+ billion per year • Six largest manufacturers hold a market share of +90% • Main market: OECD countries • 4-6% yearly unit growth mainly due to demographic development • Growing aging population and increasing life expectancy Hearing-aid users • 10% of the population in OECD countries suffer from hearing loss • Only 20% of people suffering from a hearing loss use a hearing aid • 35-40% of the population aged 65+ suffer from a hearing loss • Average age of first-time user is 69 years (USA) • Average age of all users is 72 years (USA)

  4. Hearing devices • Hearing devices help people communicate in simple and complex listening situations – also in sound environments were people with normal hearing give up using phones and headsets • Some rely on hearing devices for a few hours a day for specific situations and many use them all awake hours • Power 1 mA from zinc-air batteries replaced every week or Li-Ion batteries recharged every night • Hardware design employs many low voltage and low clock-frequency methods

  5. Enhancing segregation by transforming “mono” to “stereo”

  6. History 1953 2000 2018 2020’s • Cocktail • Sam Roweis: • Bramsløw et • When will Party One al: First time Tiny ML Problem Microphone algorithms enable coined by Source improve enhanced Colin Cherry Separation at segregation voice NIPS shows of known segregation • Cherry separation of voices for in a hearing proposes known voices people with device? mono-to- hearing loss Stereo to solve the probelm

  7. Spatial augmentation • The algorithms separates voices into m o two (or more channels) n • The hearing devices increases the o spatial difference cues, i.e. repositions the sound sources further apart • In case of spatial audio-visual cue conflicts, visual cues are expected to Artificial stereo override the auditory cues just like with ventroqlists

  8. Flowchart for training DNN training

  9. Flowchart for processing

  10. Enhanced segregation for people with mild/moderate hearing loss • DNN processing • 4 MIO weights for FDNNs (not optimized) • 250 Hz audio frame processing rate Ideal Unprocessed Bramsløw et al, JASA, 2018

  11. How listeners with normal hearing hears two competing voices

  12. How listeners with impaired hearing hears the two voices [The example could be harder to segregate]

  13. How it sounds when the two voices are separated out

  14. Focusing only on the female voice

  15. Focusing only on the male voice

  16. Enhanced segregation for people with mild/moderate hearing loss • Processing requirements • 4 MIO weights for FDNNs (not optimized) • 250 Hz audio frame processing rate Ideal Unprocessed Bramsløw et al, JASA, 2018

  17. Next steps Feature performance Hardware performance Increasing robustness to additional noise and reverberation See Zuzana Jelčicová’s poster: Increasing robustness to personal • Benchmarking and improving NN execution voice changes on DSP vs. custom accelerator for hearing instruments Break reliance on training on • From float to fixed point specific voices (transfer learning) • Parallel MACS • Two-step scaling Further decrease network sizes from 4 MIO weights

  18. Zuzana Jelčicová : Benchmarking and improving NN execution on DSP vs. custom accelerator for hearing instruments

  19. Tiny ML Summit 2020 Voice Separation with tiny ML on the edge Main collaborators: Niels H. Pontoppidan, PhD Dr. Lars Bramsløw (Eriksholm Research Centre, Denmark) Research Area Manager, Augmented Hearing Science Prof. Toumas Virtanen (University of Tampere, Finland) Gaurav Naithani (University of Tampere, Finland)

Recommend


More recommend