missing data speech recognition in rever berant acoustic
play

Missing data speech recognition in rever- berant acoustic conditions - PowerPoint PPT Presentation

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test


  1. Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy

  2. Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test conditions 5.Results 6.Discussion

  3. 1. Introduction to reverberation 30 25 20 15 10 5 0 −5 −10 −15 −15 −10 −5 0 5 10 15 20 25 30 Fig1. Image expansion. Direct sound.

  4. 1. Introduction to reverberation 150 100 50 0 −50 −100 −150 −200 −150 −100 −50 0 50 100 150 200 Fig2. Image expansion. 3rd order reflections.

  5. 1. Introduction to reverberation 1 0 0.8 −20 0.6 −40 0.4 0.2 −60 0 −80 −0.2 −0.4 −100 −0.6 −120 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −0.8 4 0 1000 2000 3000 4000 5000 6000 7000 x 10 Fig3. Room impulse responses. Left: linear amplitude. Right: log-amplitude.

  6. 2. Speech modulation frequencies 30 30 25 25 20 20 15 15 10 10 5 5 10 20 30 40 50 60 70 80 20 40 60 80 100 120 140 160 Fig4. Ratemap for Fig5. FFT of the utterance 5527. ratemap of Fig3.

  7. 3. Reverberation masking model g rate map cube root auditory downsampling by compression missing data filterbank leaky integrator speech masking recogniser threshold filter mask Magn (dB) 20 10 0 −10 0 10 20 30 40 50 Freq. (Hz) Fig6. Diagram of the model.

  8. 3. Reverberation masking model 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 Fig7. Rate maps; Top: clean. Fig8. Masks; Top: a-priori. Bottom: reverberated Bottom: reverb. masking

  9. 4. Test conditions Room 1: 15 x 13 x 6.5 m 3 , T60=0.7 sec., D/R=0 & -10 dB Room 2: 25 x 20 x 6 m 3 , T60=1.7 sec., D/R=0 & -10 dB Room 3: 55 x 35 x 14 m 3 , T60=2.7 sec., D/R=0 & -10 dB

  10. 5. Results T 60 =0.7s T 60 =0.7s T 60 =1.7s T 60 =2.7s T 60 =2.7s T 60 =1.7s Recognition Clean D/R= D/R= D/R= D/R= D/R=- Technique D/R=0dB 0dB -10db -10dB 0dB 10dB Unity mask 98.26 62.48 46.39 42.82 29.24 34.90 24.28 MFCC 99.65 60.40 47.08 47.35 34.46 40.73 28.55 Reverb. 90.33 85.03 82.25 71.11 66.05 45.52 masking a priori mask 95.82 93.73 92.42 89.12 90.60 87.99 T

  11. 6. Discussion •Advantage over robust feature tech- niques: + Mask estimation can be changed on the fly when condi- tions change -> no-retraining required when the rule is changed + Better performance ??? • Currently all thresholds are hand tuned, an adaptive system is under work

Recommend


More recommend