spoken language understanding on the edge
play

Spoken Language Understanding on the Edge Alaa Saade, Alice Coucke, - PowerPoint PPT Presentation

Spoken Language Understanding on the Edge Alaa Saade, Alice Coucke, Alexandre Caulier, Joseph Dureau, Adrien Ball, Thodore Bluche, David Leroy, Clment Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Mael Primet Snips,


  1. Spoken Language Understanding on the Edge Alaa Saade, Alice Coucke, Alexandre Caulier, Joseph Dureau, Adrien Ball, Théodore Bluche, David Leroy, Clément Doumouro, Thibault Gisselbrecht, Francesco Caltagirone, Thibaut Lavril, Mael Primet Snips, Paris EMC2 Workshop @ Neurips 2019 
 November 13 Alexandre Caulier

  2. 
 
 
 Spoken language understanding system Automatic Speech Recognition Language modeling Engine Intent: 
 Natural t ɜ r n ɑ n ð ə 
 Turn on the SwitchLightOn Acoustic Language Language l a ɪ t s ɪ n ð ə 
 lights in the model model Understanding ˈ l ɪ v ɪ ŋ r u m living room Slots: 
 Engine room: living room Features Tested and certified to run on 1GB RAM 
 1.4GHz CPU • Cloud independent - no remote processing 
 • Private by Design - no user data can be collected 
 • Accurate - on-par with cloud-based solutions

  3. 
 
 
 Acoustic modeling Automatic Speech Recognition Language modeling Engine Intent: 
 Natural t ɜ r n ɑ n ð ə 
 Turn on the SwitchLightOn Acoustic Language Language l a ɪ t s ɪ n ð ə 
 lights in the model model Understanding ˈ l ɪ v ɪ ŋ r u m living room Slots: 
 Engine room: living room Deep neural Proba over phones network /a/ /b/ /c/ /d/ /e/ time Challenges Large deep learning models 
 Trade-off between accuracy & computational efficiency Computationally & memory intensive Reduced model size (~10MB) Training data: 10K+ hours of in-domain audio with transcript per language Few K hours of training data

  4. 
 Assistant Contextualization Automatic Speech Recognition Language modeling Engine Intent: 
 Natural t ɜ r n ɑ n ð ə 
 Turn on the SwitchLightOn Acoustic Language Language l a ɪ t s ɪ n ð ə 
 lights in the model model Understanding ˈ l ɪ v ɪ ŋ r u m living room Slots: 
 Engine room: living room Approach : LM and NLU are consistent and contextualized Language Model Proba over Decoding phones graph /a/ /b/ /c/ /d/ Turn on the lights in the living room time Natural Language Understanding Logistic Conditional regression Random Field Intent Slots Sentence Lightweight models Out of vocabulary management On-device personalization

  5. 
 Benchmarks - Datasets Open Sourcing 
 Experimental setting Method Datasets Metrics Audio utterances with transcripts & supervision End-to-end score Specialized for 💢 & 🎶 Recorded in close and far- field % of perfectly parsed queries <100MB, real time on a Raspberry Pi 3 💢 Smart Lights Assistant 
 Intent: 
 SwitchLightOn 1.8K utterances 
 Slots: 
 room: living room 400 word pronunciations 🎶 Music Assistant 
 Google Speech-to-Text cloud services 
 3K utterances 
 One-size-fits-all engine 178K word pronunciations

  6. 
 Benchmarks 
 End-to-End performance 100% % of perfectly parsed queries Contextualized for 💢 & 🎶 84 79 <100MB, real time on a 69 Raspberry Pi 3 50% 48 STT cloud service 
 One-size-fits-all engine 0% 🎶 Smart Lights Assistant 💢 
 Music Assistant 🎶 
 Tier 1 Artists 
 Tier 2 Artists 
 Tier 3 Artists 
 400 word pronunciations 178K word pronunciations 1-1k 4.5k-5.5k 9k-10k Snips 71 % 68 % 67 % Google 69 % 38 % 37 % Questions ?

Recommend


More recommend