Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy
Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test conditions 5.Results 6.Discussion
1. Introduction to reverberation 30 25 20 15 10 5 0 −5 −10 −15 −15 −10 −5 0 5 10 15 20 25 30 Fig1. Image expansion. Direct sound.
1. Introduction to reverberation 150 100 50 0 −50 −100 −150 −200 −150 −100 −50 0 50 100 150 200 Fig2. Image expansion. 3rd order reflections.
1. Introduction to reverberation 1 0 0.8 −20 0.6 −40 0.4 0.2 −60 0 −80 −0.2 −0.4 −100 −0.6 −120 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −0.8 4 0 1000 2000 3000 4000 5000 6000 7000 x 10 Fig3. Room impulse responses. Left: linear amplitude. Right: log-amplitude.
2. Speech modulation frequencies 30 30 25 25 20 20 15 15 10 10 5 5 10 20 30 40 50 60 70 80 20 40 60 80 100 120 140 160 Fig4. Ratemap for Fig5. FFT of the utterance 5527. ratemap of Fig3.
3. Reverberation masking model g rate map cube root auditory downsampling by compression missing data filterbank leaky integrator speech masking recogniser threshold filter mask Magn (dB) 20 10 0 −10 0 10 20 30 40 50 Freq. (Hz) Fig6. Diagram of the model.
3. Reverberation masking model 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 Fig7. Rate maps; Top: clean. Fig8. Masks; Top: a-priori. Bottom: reverberated Bottom: reverb. masking
4. Test conditions Room 1: 15 x 13 x 6.5 m 3 , T60=0.7 sec., D/R=0 & -10 dB Room 2: 25 x 20 x 6 m 3 , T60=1.7 sec., D/R=0 & -10 dB Room 3: 55 x 35 x 14 m 3 , T60=2.7 sec., D/R=0 & -10 dB
5. Results T 60 =0.7s T 60 =0.7s T 60 =1.7s T 60 =2.7s T 60 =2.7s T 60 =1.7s Recognition Clean D/R= D/R= D/R= D/R= D/R=- Technique D/R=0dB 0dB -10db -10dB 0dB 10dB Unity mask 98.26 62.48 46.39 42.82 29.24 34.90 24.28 MFCC 99.65 60.40 47.08 47.35 34.46 40.73 28.55 Reverb. 90.33 85.03 82.25 71.11 66.05 45.52 masking a priori mask 95.82 93.73 92.42 89.12 90.60 87.99 T
6. Discussion •Advantage over robust feature tech- niques: + Mask estimation can be changed on the fly when condi- tions change -> no-retraining required when the rule is changed + Better performance ??? • Currently all thresholds are hand tuned, an adaptive system is under work
Recommend
More recommend