some notes on the psychoacoustics and signal processing
play

Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP - PowerPoint PPT Presentation

Wire Communication Laboratory - University of Patras Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP Analysis of Speech Jrg Buchholz Introduction Reflection Masking Model RASTA-PLP Processing RASTA


  1. Wire Communication Laboratory - University of Patras Some Notes on the Psychoacoustics and Signal Processing of RASTA-PLP Analysis of Speech Jörg Buchholz • Introduction • Reflection Masking Model • RASTA-PLP Processing • RASTA applied to RIR filtered speech

  2. Comments on Perceptual Modelling Psychoacoustics Physiology Description Requirements Perceptual Model Comparison (Masking Model) Application Requirements Speech enhancement in reverberant environments

  3. Illustration of Masking and Suppression pre (backward) simultaneous post (forward) masking masking masking frequency suppressor test signals test signals masker suppressor time

  4. Structure of the Masking Module TMM Module (BP-Filterbank) Two-Tone Suppression Simultaneous Masking Transformation / TMM s (t) i-1 Resythesis Module Feature Vectors / Directivity s (t) TMM i Audible Signal Module s (t) i+1 TMM

  5. Block Diagramm of the RASTA-PLP Method Speech FFT CB-integration (mel-scale) compressing static NL (log) linear BP-filtering RASTA expanding static NL (Exp) equal loudness curve power law of hearing IFFT / IDFT solving set of linear equations cepstral recursion cepstral coefficients of RASTA-PLP model

  6. Time / Frequency Analysis short time analysis i i+1 i+2 i+3 i+4 i+5 1 amplitude 0 FFT power spectrum fa/2 frequency CB integration 0 mel-scale power spectrum fa/2 frequency time trajectory k 0 0 10 20 30 40 50 60 time / ms

  7. CB-integration (mel-scale) 20 Magnitude in dB 0 -20 -40 2 3 4 10 10 10 40 Magnitude in dB 20 0 -20 -40 -60 2 3 4 10 10 10 frequency / Hz

  8. Overview of some RASTA Methods • Additive Noise (uncorrelated) → Lin-RASTA FT ( ) ( ) ( ) ( ) + → ω + ω s t n t S N • Convolutional Noise → Log-RASTA FT ( ) ( ) log ( ) ( ) ( ) ( ) ( ) ( ) ⊗ → ω ⋅ ω → ω + ω s t h t S H S H log log • Convolutional and Additive Noise → J-RASTA (Lin/Log) y − y e e 1 = + ⋅ = ≈ y J x x ln( 1 ) J J Optimal J depending on the noise power!

  9. Rasta BP-Filter 0.3 0.2 Amplitude 0.1 0 0 50 100 150 200 250 300 350 400 Time / ms 10 0 Magnitude in dB -10 -20 -30 -40 -2 -1 0 1 2 10 10 10 10 10 Modulation Frequency / Hz − − − + − − z z z 1 3 4 2 2 ( ) = ⋅ ⋅ H z z 4 01 . − − z 1 1 0 98 .

  10. LIN-RASTA-Processing of clean speech 1 Time Trajectory (1 kHz band) 0.5 0 0 500 1000 1500 2000 normalis ed amplitude 1 0.5 BP-filtered Time Trajectory 0 -0.5 0 500 1000 1500 2000 1 Negative values set to zero 0.5 0 0 500 1000 1500 2000 time / ms

  11. Lin-RASTA-Processing of noisy speech (-5 dB SNR) 1 Time Trajectory (1 kHz band) 0.5 0 0 500 1000 1500 2000 normalis ed amplitude 1 0.5 BP-filtered Time Trajectory 0 -0.5 0 500 1000 1500 2000 1 Negative values set to zero 0.5 0 0 500 1000 1500 2000 time / ms

  12. RASTA applied to Room Impulse Responses Basic Assumptions for RASTA processing: • Analysis Window length >> Filter (RIR) length • Filter (RIR) should be constant (slowly changing) for a duration >> Analysis Window length • For ASR: Window length ≈ 20-30 ms Conflict!!! Baseline Multiresolution Clean 8,6 % 13,5 % Reverberant 34,8 % 22,8 % Isolated Digits Word Error Rates

  13. ✂ ✁ � Multiresolution Processing Concept (Avendano) ω X(n , ) 1 1k ω 1 n 1 x(n) x(n) A n n ω X(n , ) 2 2k ω 2 n 2

  14. Equal-Loudness Curves -12 dB +18 dB 0 dB -6 dB 0 dB

  15. Loudness function (Zwicker)

Recommend


More recommend