developing your own wake word engine
play

Developing Your Own Wake Word Engine Just Like Alexa and OK Google - PowerPoint PPT Presentation

Developing Your Own Wake Word Engine Just Like Alexa and OK Google Xuchen Yao, CEO, KITT.AI Guoguo Chen, CTO, KITT.AI Whats a wake word? Alexa whats the weather today? OK Google Hey Siri Wake word One shot


  1. Developing Your Own Wake Word Engine Just Like “ Alexa ” and “OK Google” Xuchen Yao, CEO, KITT.AI Guoguo Chen, CTO, KITT.AI

  2. What’s a “wake word”? Alexa what’s the weather today? OK Google Hey Siri • Wake word • One shot • Hot word understanding • Offline • Online • Code runs on • Code runs on cloud CPU/DSP/MCU • 7x24 • On Demand • Always listening • Explicit permission

  3. Conversational UI Pipeline wake up device voice speech  text text  speech text text dialogue understanding management

  4. a customizable hotword detection engine a.k.a: deep neural network in 2MB of RAM hotword.io video blog

  5. Who’s using it (released 5/2016) 10,000+ developers, 7000+ unique hotwords Dominating developer community for hotword detection

  6. Use Cases

  7. #1 Hotword: Smart Mirror https://github.com/evancohen/smart-mirror (credits to Evan Cohen) video link

  8. Command & Control: GoPiGo (credits to Paul Matz) video link

  9. Project RePL (credits to Chris Burns) video link

  10. Conversational UI Pipeline wake up device voice speech  text text  speech Speech Pipeline text text dialogue understanding management

  11. Speech Pipeline Wake Word Speech Microphone Voice Detection Recognition Array local cloud/local • Close talking • IBM/Microsoft/Nua • Telephone nce/Google (8KHz Sampling) • Far field (3-9 • Alexa Voice Service • Others (16KHz) feet) • Voice Activity Detection • 2, 4, or 6 • Kaldi • Noises: TV, • Auto Gain microphones • PocketSphinx radio, street, Control • Linear/circular • HTK café, car, music • Adaptive Echo • Fast response • Command & Control • Pitch: children, • Language Cancellation (0.1 second) adults, senior Understanding • Beam forming • High accuracy • Accent: US/UK/Europe/ Asian …

  12. Supported Platforms and Wrappers • Raspberry Pi • Mac OS X • iPhone/iPad/iPod • x86/64bit Ubuntu • Android • Pine 64 • Intel Edison • Samsung Artik • Allwinner R-series • Ingenic X1000 • Rockchip

  13. Personal vs. Universal models Personal Universal Voice samples needed 3 At least 1500 Speaker-independent No Yes Speaker-specific Sort of No Robust against noise No Yes Free Yes No Time needed Immediately 2 weeks

  14. Customizing a universal model hotword collect voice web API from device Iterate & Improve define train a deliver & deploy to collect voice hotword model evaluate beta users desired performance: ship & >90% detection rate success <= 3 false alarms in 24 hours

  15. Science behind wake word

  16. Challenges Is this “ Alexa ”? • High detection rate • Low false alarm • Efficient: detect every 0.1 short window longer window second • Small RAM: <2MB • Too much ambiguity, not much context

  17. Existing Algorithm

  18. Existing Algorithm

  19. Existing Algorithm • Advantage: – Simplified pipeline – Simplified decoder • Disadvantage: – Massive hotword specific training data

  20. Possible Ways to Improve • Data augmentation – Adding noise – Adding reverberation – And so on … original add noise add noise and reverberation

  21. Possible Ways to Improve • Network models – Model selection • Feedforward models? Recurrent models? – Model compression • 32-bit float  16-bit float  8-bit integer • Parameters with small absolute value

  22. Possible Ways to Improve • Decoder redesigning – Modeling smaller units • Syllables, phones, etc – False alarm suppression • Additional classifier?

  23. Training with Tesla K20/K80 • Positive data – 1,500 hotword samples • Negative data – Thousands of hours of speech • Training time – Half a day with 4 K80 GPUs

  24. Software Architecture Backend Frontend

  25. KITT.AI Scientific Computing Content Data Training Model Deploy Websocket audio, msg Traffic HTTPs  Deep Learning Cloud ELB Message Queue Production Devices Cloud

  26. Running Your First Snowboy Demo

Recommend


More recommend