Chair of Network Architecures and Services Departments of Informatics Technical University of Munich Voice Controlled Smart Spaces Florian Gratzer Advisors: Marc-Oliver Pahl Stefan Liebald Garching, January 16 th 2017
Overview • Controlling the environment via voice commands • VSL as middleware • Mapping of voice input to actions within the VSL • Arbitrary devices • Arbitrary commands • Configurable mapping [http://kreyosimages.s3.amazonaws.com/ks_voice_illustration.jpg, http://media.guitarcenter.com/is/image/MMGS7/V250-Condenser-Microphone/H77071000000000-00-500x500.jpg, http://diziusa.com/wp-content/uploads/2014/12/lampe.jpg, http://i.ebayimg.com/00/s/NjY3WDEwMDA=/z/aFsAAMXQyY1TQp3s/$_32.JPG] Florian Gratzer | Voice Controlled Smart Spaces 2
Key Requirements • Short response time • Low error rate • Offline functionality • Voice output • Support of custom devices • Easy configuration Florian Gratzer | Voice Controlled Smart Spaces 3
Apple HomeKit • Framework for home automation • Available for iOS devices • Can be used by Siri • Custom built devices not supported [http://www.apple.com/ios/home/] Florian Gratzer | Voice Controlled Smart Spaces 4
Amazon Echo • Voice controllable speaker • Amazon Alexa for voice recognition • Custom built devices not supported [http://www.giga.de/wp-content/uploads/2016/05/Amazon-Echo.jpg] Florian Gratzer | Voice Controlled Smart Spaces 5
Voice Controlled Alarm Clock • Built for IoT contest • Voice control of custom built devices • Pocketsphinx for voice recognition • Flite for speech synthesis [https://www.element14.com/community/community/design-challenges/pi-iot/blog/2016/08/15/pi-iot-alarm-clock-16-wiring] Florian Gratzer | Voice Controlled Smart Spaces 6
Related work HomeKit Echo Alarm Clock Offline - - + functionality Easy + + - configuration Custom devices - - + supported Florian Gratzer | Voice Controlled Smart Spaces 7
Design Florian Gratzer | Voice Controlled Smart Spaces 8
Voice recognition • Human speech waveform contains a large amount of information • Dependent on • Speaker • Speaking rate • Acoustic conditions • Hardly possible to match samples directly • Multiple processing steps are required • Using “Features” for matching Florian Gratzer | Voice Controlled Smart Spaces 9
Processing steps • Start and end time detection • Manual • (Semi-)automatic • Feature Extraction • Filtering • Windowing • Extracting features • Feature Matching • Mapping the recording to a sample [B. Pfister – Sprachverarbeitung ISBN: 9788578110796] Florian Gratzer | Voice Controlled Smart Spaces 10
Speech to text engines • Evaluated • Used • CMU Sphinx • Training required • Kaldi • HTK • Designed as front end • Jasper • CMU Sphinx • Requires • Acoustic Model • Dictionary • Language Model Florian Gratzer | Voice Controlled Smart Spaces 11
Design Florian Gratzer | Voice Controlled Smart Spaces 12
Speech Synthesis • 2 phases • Transcription phase • Phonoacoustic phase • Text to speech engines • MaryTTS • espeak Florian Gratzer | Voice Controlled Smart Spaces 13
Design Florian Gratzer | Voice Controlled Smart Spaces 14
Mapping Format • Multiple actions per voice command • Voice output support • GET and SET commands Florian Gratzer | Voice Controlled Smart Spaces 15
Design Florian Gratzer | Voice Controlled Smart Spaces 16
Configration Interface Florian Gratzer | Voice Controlled Smart Spaces 17
Design Florian Gratzer | Voice Controlled Smart Spaces 18
Demo Florian Gratzer | Voice Controlled Smart Spaces 19
Evaluation • Scenario Light control Custom dictionary • Error rate • Target error rate: < 5% • Actual error rate: 2% Florian Gratzer | Voice Controlled Smart Spaces 20
Evaluation • Response time: • Remote control as reference • Assumption: RC in reach, but not in hand • Target: < 3s Florian Gratzer | Voice Controlled Smart Spaces 21
Contributions Florian Gratzer | Voice Controlled Smart Spaces 22
Future work • Integrate the system in the lab room • Test other voice recognition engines • WhatsApp interface • Further evaluations of the system [https://i0.wp.com/thegadgetox.net/wp-content/uploads/2016/02/whatsapp-logo-vector.png? fit=1150%2C1163&ssl=1] Florian Gratzer | Voice Controlled Smart Spaces 23
Thank you for your attention! Florian Gratzer | Voice Controlled Smart Spaces 24
Recommend
More recommend