Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin - PowerPoint PPT Presentation

Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin Department of Electrical Engineering and Computer Engineering Université de Sherbrooke, Québec, Canada Jean-Marc.Valin@USherbrooke.ca

Motivations ● Robots need information about their environment in order to be intelligent ● Artificial vision has been popular for a long time, but artificial audition is new ● Robust audition is essential for human- robot interaction ( cocktail party effect )

Approaches To Artificial Audition ● Single microphone – Human-robot interaction – Unreliable ● Two microphones (binaural audition) – Imitate human auditory system – Limited localisation and separation ● Microphone array audition – More information available – Simpler processing

Objectives ● Localise and track simultaneous moving sound sources ● Separate sound sources ● Perform automatic speech recognition ● Remain within robotics constraints – complexity, algorithmic delay – robustness to noise and reverberation – weight/space/adaptability – moving sources, moving robot

Experimental Setup cube (C1) ● Eight microphones on shell(C2) the Spartacus robot ● Two configurations ● Noisy conditions ● Two environments ● Reverberation time – Lab (E1) 350 ms – Hall (E2) 1 s

Sound Source Localisation

Approaches to Sound Source Localisation ● Binaural – Interaural phase difference (delay) – Interaural intensity difference ● Microphone array – Estimation through TDOAs – Subspace methods (MUSIC) – Direct search (steered beamformer) ● Post-processing – Kalman filtering – Particle filtering

Steered Beamformer ● Delay-and-sum beamformer ● Maximise output energy ● Frequency domain computation

Spectral Weighting ● Normal cross-correlation peaks are very wide ● PHAse Transform (PHAT) has narrow peaks ● Apply weighting – Weight according to noise and reverberation – Models the precedence effect ● Sensitivity is decreased after a loud sound

Direction Search ● Finding directions with highest energy ● Fixed number of sources Q=4 ● Lookup-and-sum algorithm ● 25 times less complex

Post-Processing: Particle Filtering ● Need to track sources over time ● Steered beamformer output is noisy ● Representing pdf as particles ● One set of (1000) particles per source ● State=[position, speed]

Particle Filtering Steps 1) Prediction 2) Instantaneous probabilities estimation – As a function of steered beamformer energy

Particle Filtering Steps (cont.) 3) Source-observation assignment – Need to know which observation is related to which tracked source – Compute ● : Probability that q is a false alarm ● : Probability that q is source j ● : Probability that q is a new source

Particle Filtering Steps (cont.) 4) Particle weights update – Merging past and present information – Taking into account source-observation assignment 5) Addition or removal of sources 6) Estimation of source positions – Weighted mean of the particle positions 7) Resampling

Localisation Results (E1) Detection accuracy over distance Localisation accuracy

Tracking Results Two sources crossing with C2 ● Video E1 E2

Tracking Results (cont.) Four moving sources with C2 E1 E2

Sound Source Separation & Speech Recognition

Overview of Sound Source Separation ● Frequency domain processing – Simple, low complexity ● Linear source separation ● Non-linear post-filter Tracking information Microphones Separated X n  k ,l  Sources Geometric  Y m  k ,l  Sources S m  k ,l  Post- source S m  k ,l  filter separation

Geometric Source Separation ● Frequency domain: ● Constrained optimization – Minimize correlation of the outputs: – Subject to geometric constraint: ● Modifications to original GSS algorithm – Instantaneous computation of correlations – Regularisation

Multi-Source Post-Filter

Interference Estimation ● Source separation leaks – Incomplete adaptation – Inaccuracy in localization – Reverberation/diffraction – Imperfect microphones ● Estimation from other separated sources

Reverberation Estimation ● Exponential decay model ● Example: 500 Hz frequency bin

Results (SNR) ● Three speakers ● C2 (shell), E1 (lab) 15 12,5 10 7,5 SNR (dB) 5 Source 1 2,5 Source 2 0 Source 3 -2,5 -5 -7,5 Input Delay- GSS GSS + GSS + and- single- multi- sum source source

Speech Recognition Accuracy (Nuance) ● Proposed post-filter reduces errors by 50% ● Reverberation removal helps in E2 only ● No significant difference between C1 and C2 E2, C2, 3 speakers ● Digit recognition 90% 85% ● 3 speakers: 83% Word correct (%) 80% 75% ● 2 speakers: 90% Right 70% Front 65% Left 60% microphone separated 55% 50% GSS only Post-filter Proposed (no dere- system verb.)

Man vs. Machine ● How does a human compare? 90% 85% 80% Word correct (%) 75% 70% 65% 60% 55% 50% Listener Listener Listener Listener Listener Pro- 1 2 3 4 5 posed ● Is it fair? system – Yes and no!

Real-Time Application ● Video from AAAI conference

Speech Recognition With Missing Feature Theory ● Speech is transformed into features (~12) ● Not all features are reliable ● MFT = ignore unreliable features – Compute missing feature mask – Use the mask to compute probabilities

Missing Feature Mask Interference: black: reliable unreliable white: unreliable Stationary noise: reliable

Results (MFT) ● Japanese isolated word recognition (SIG2 robot, CTK) – 3 simultaneous sources – 200-word vocabulary – 30, 60, 90 degrees separation 80 70 Word correct (%) 60 50 40 Right 30 Front 20 Left 10 0 GSS GSS+post- GSS+post- filter filter+MFT

Summary of the System

Conclusion ● What have we achieved? – Localisation and tracking of sound sources – Separation of multiple sources – Robust basis for human-robot interaction ● What are the main innovations? – Frequency-domain steered beamformer – Particle filtering source-observation assignment – Separation post-filtering for multiple sources and reverberation – Integration with missing feature theory

Where From Here? ● Future work – Complete dialogue system – Echo cancellation for the robot's own voice – Use human-inspired techniques – Environmental sound recognition – Embedded implementation ● Other applications – Video-conference: automatically follow speaker with a camera – Automatic transcription

Questions? Comments?

Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin - PowerPoint PPT Presentation

Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin Department of Electrical Engineering and Computer Engineering Universit de Sherbrooke, Qubec, Canada Jean-Marc.Valin@USherbrooke.ca Motivations Robots need information

1 2 Auditory processing is crucial because our learning is heavily reliant on auditory system---=

Robothlon Team competition, each team programs a robot for each event Events Robot

Auditory System Whats the frequency Kenneth? Overview Intro Physical Stimulus: Sound

Auditory Sensory System Agenda Review Auditory Sense: Hearing Other senses

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Vestibular and Auditory Sensory Systems Auditory Modulation difficulties Low Frequency:

rtAIM : a real-time implementation of the auditory image model of auditory periphery Willem van

WHAT IS AUDITORY PROCESSING? HOW DOES IT IMPACT UPON LEARNERS? WHAT IMPACTS UPON AUDITORY

EE E6820: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1

(De)Composing auditory ERPs: Estimating cross-linguistic variations by combining auditory change

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .

Robot behaviour and control A robot can be defined as an intelligent link between perception

Robot sensors A robot can be defined as an intelligent link between perception and action

Kostas Koumantaros TF-Storage Dublin, February 2009 Outline An introduction to GSS

Gas in the Ring Geometry Eugene Zaremba Queens University, Kingston, Ontario, Canada

Green Software Services From requirements to Business Models Schahram Dustdar TU Wien Austria

Population Estimates Program: Census 2020 Preparations Working to ensure accurate and

Group-Signature Schemes on Constrained Devices Raphael Spreitzer and J orn-Marc Schmidt

Communities Challenge Phase II GFO-19-603 Pre-Application Workshop Sharon Purewal Fuels and

Improving Data Quality in the Survey of Graduate Students and Postdoctorates in Science and

Todays Guest Speakers: Ms. Alexandra Rouse, Ms. Jihyun (Ji) Huyck, IT Professional Services

Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin - PowerPoint PPT Presentation

Auditory System For a Mobile Robot PhD Thesis Jean-Marc Valin Department of Electrical Engineering and Computer Engineering Universit de Sherbrooke, Qubec, Canada Jean-Marc.Valin@USherbrooke.ca Motivations Robots need information

1 2 Auditory processing is crucial because our learning is heavily reliant on auditory system---=

Robothlon Team competition, each team programs a robot for each event Events Robot

Auditory System Whats the frequency Kenneth? Overview Intro Physical Stimulus: Sound

Auditory Sensory System Agenda Review Auditory Sense: Hearing Other senses

What is a robot? A robot is an intelligent system that interacts with the Robot Lecture 2:

Vestibular and Auditory Sensory Systems Auditory Modulation difficulties Low Frequency:

rtAIM : a real-time implementation of the auditory image model of auditory periphery Willem van

WHAT IS AUDITORY PROCESSING? HOW DOES IT IMPACT UPON LEARNERS? WHAT IMPACTS UPON AUDITORY

EE E6820: Speech &amp; Audio Processing &amp; Recognition Lecture 4: Auditory Perception 1

(De)Composing auditory ERPs: Estimating cross-linguistic variations by combining auditory change

Rational Robot A Test Automation Tool What is Rational Robot? Rational Robot is a complete

Verifying the Motion of a Robot Arm Akul Penugonda 1 /6 Akul Penugonda - Robot Arm Motion 2

MOBILE ADVERTISING Agenda Get off to a mobile start with Media Impact! Why mobile? MI

Out line Robot ics Percept ion Robot ics Planning Reading: R&amp;N Sect .

Robot behaviour and control A robot can be defined as an intelligent link between perception

Robot sensors A robot can be defined as an intelligent link between perception and action

Kostas Koumantaros TF-Storage Dublin, February 2009 Outline An introduction to GSS

Gas in the Ring Geometry Eugene Zaremba Queens University, Kingston, Ontario, Canada

Green Software Services From requirements to Business Models Schahram Dustdar TU Wien Austria

Population Estimates Program: Census 2020 Preparations Working to ensure accurate and

Group-Signature Schemes on Constrained Devices Raphael Spreitzer and J orn-Marc Schmidt

Communities Challenge Phase II GFO-19-603 Pre-Application Workshop Sharon Purewal Fuels and

Improving Data Quality in the Survey of Graduate Students and Postdoctorates in Science and

Todays Guest Speakers: Ms. Alexandra Rouse, Ms. Jihyun (Ji) Huyck, IT Professional Services

EE E6820: Speech & Audio Processing & Recognition Lecture 4: Auditory Perception 1

Out line Robot ics Percept ion Robot ics Planning Reading: R&N Sect .