on adaptive line enhancer
play

on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , - PowerPoint PPT Presentation

Speech Enhancement Based on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , 2020 Supervised by Prof. Israel Cohen Outline Introduction The problem researched The challenges Research contributions Adaptive


  1. Speech Enhancement Based on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , 2020 Supervised by Prof. Israel Cohen

  2. Outline ▪ Introduction ▪ The problem researched ▪ The challenges ▪ Research contributions ▪ Adaptive Line Enhancer background ▪ Convention fixed step size ▪ Mutual Information approach ▪ Proposed method ▪ Conclusions and future research Sound test

  3. Noise is Everywhere! Reverberation Echo Source Additive noise Interference

  4. Speech Enhancement

  5. Applications

  6. Harmonic noise ▪ Contains deterministic sinusoidal components

  7. The problem researched Reducing nonstationary harmonic noise from a speech signal recorded with a single microphone Source Nonstationary Harmonic Additive noise

  8. The challenges ▪ Single channel – only the noisy signal is available with no access to additional reference signals and no spatial information → only intrinsic properties of speech or noise can be used ▪ The vast majority of methods require an estimate of the noise spectrum ▪ When the noise is stationary it can be estimated during segments when speech is absent ▪ When the noise is nonstationary it needs to be tracked continuously → it is more difficult to estimate nonstationary noise ▪ Trade-off between noise reduction to speech distortion ▪ The developed method needs to be relevant for real-time applications

  9. Research Contributions ▪ Introduced a filtering method based on the frequency domain Adaptive Line Enhancer, that enables better reduction of nonstationary harmonic noise. ▪ Proposed the combined filter – a combination of the commonly-used forward adaptive linear filter and a non-causal backward adaptive linear filter used together, increasing the reduction span of the noise transient ▪ Applied the filter based on a comparison to the noisy spectrum, reducing noise overestimation ▪ Applied the filter based on a noise presence indicator for better speech preservation ▪ Employed a set of filter lengths, to ensure the combined filter spans throughout the noise transient

  10. Additional contributions ▪ Investigated a statistical model as an alternative to the Decision Directed for the a-priori SNR estimator and showed that it can eliminate the musical noise while compromising between signal distortion and noise reduction. ▪ Introduced a beamformer that enables fine tuning of the compromise between Directivity Factor and White Noise Gain, through a simple computationally- efficient algorithm.

  11. Why use Adaptive Line Enhancer? ▪ Exploits the structure of the harmonic noise ▪ Simple with low computational cost ▪ Modifies both magnitude and phase so has the potential to improve on signal intelligibility and not just quality

  12. Adaptive Noise Canceller (ANC) Primary Input Signal Output +𝑒(𝑜) 𝑦 𝑜 + 𝑤(𝑜) 𝑓(𝑜) + Source 𝑦 (𝑜) ො +𝑒(𝑜) − 𝑤 (𝑜) ො 𝑤 0 (𝑜) distortion Noise Adaptive filter Source +𝑦 0 (𝑜) Adaptive Reference algorithm ? Input 𝑦 = 𝑦 + 𝑤 − ො ො 𝑤 𝑦 2 = 𝐹 𝑦 2 + 𝑛𝑗𝑜𝐹 𝑤 2 𝑛𝑗𝑜𝐹 ො 𝑤 − ො 𝑦 2 = 𝑛𝑗𝑜𝐹 𝑤 2 𝑛𝑗𝑜𝐹 𝑦 − ො 𝑤 − ො 𝑤 = 𝑤, ො ො 𝑦 = 𝑦 Ideal case:

  13. Adaptive Line Enhancer (ALE) Input 𝑧 𝑜 Output Signal decorrelated Signal 𝑦(𝑜) 𝑓(𝑜) = 𝑦 𝑜 + 𝑤(𝑜) + 𝑦 (𝑜) ො Noise correlated Source − Output ’ 𝑨 (𝑜) 𝑤(𝑜) Noise decorrelated Noise −  𝑦 (𝑜) ො Adaptive filter z signal correlated Source Adaptive algorithm

  14. Adaptive Line Enhancer (ALE) TD FD Input 𝑍 𝑙, 𝑛 𝑧 𝑜 Output X(𝑙, 𝑛) Signal 𝑦(𝑜) 𝐹 𝑙, 𝑛 𝑓(𝑜) = 𝑦 𝑜 + 𝑤(𝑜) = 𝑌 𝑙, 𝑛 + 𝑊 𝑙, 𝑛 + ෠ 𝑦 (𝑜) ො 𝑌 𝑙, 𝑛 Source − 𝑨 (𝑜) 𝑎 𝑙, 𝑛 𝑤(𝑜) 𝑊 𝑙, 𝑛 Noise −  Adaptive filter z Source Adaptive algorithm

  15. Adaptive Line Enhancer (ALE) FD Input 𝑍 𝑙, 𝑛 Output X(𝑙, 𝑛) Signal 𝐹 𝑙, 𝑛 = 𝑌 𝑙, 𝑛 + 𝑊 𝑙, 𝑛 + ෠ 𝑌 𝑙, 𝑛 Source − 𝑎 𝑙, 𝑛 𝑊 𝑙, 𝑛 Noise −  Adaptive filter z 𝜐 Source Adaptive algorithm 𝜈 ( )  ( ) ( )  = T h k , m H k , m ,..., H k , m − 0 L 1 ( ) ( ) ( )  ( ) ( )  −  −  = −  −  − + * y T ( ) ( ) E k , m k , m y k , m Y k , m ,..., Y k , m L 1 + = +  h h k , m 1 k , m NLMS: ( ) −  2 +  ( ) ( ) ( ) y k , m = −  H h y Z k , m k , m k , m

  16. Conventional Fixed Step Size Example (a) Clean signal (b) Noisy signal (c) Enhanced signal Freq. Index Freq. Index Freq. Index Frame Index Frame Index Frame Index For the conventional fixed step size, it is difficult to both reduce the noise and maintain high quality of the enhanced signal  = = 1 , 3 L

  17. Mutual Information Approach Taghia, J., Martin, R., 2016, “ A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction ” IEEE Trans. Audio Speech Lang. Process. ▪ Frequency dependent step size, detecting harmonic noise presence per frequency ▪ Based on Mutual Information (MI) ▪ Step size: ( ) ( )  =   k Q k  0 ( )  ( )  ˆ  K ( ) P *  ( ) I k , k k 2   P  = 1 , if I k I  = k   ( ) thr Q  ˆ constant  total * =  I k , k k k 1  0  0 , else

  18. MI Approach Example (b) Noisy signal (c) MI Step Size Freq. Index 𝜈 Frame Index Freq. [KHz] = Q 1 

  19. MI Approach Example (a) Clean signal (b) Enhanced signal – fixed step size (c) Enhanced signal - MI Freq. Index Freq. Index Freq. Index Frame Index Frame Index Frame Index

  20. MI Approach ▪ Implemented in block-wise manner ▪ Assumption: stationarity of the noise is at least as large as the block length ▪ They take block length of 3 seconds The assumption does not hold for highly non- stationary signals, such as the heart monitor beeping Decision block often zero for highly non-stationary signals, such as the heart monitor beeping Spectrogram of 3.4s long heart monitor beeping Taghia, J., Martin, R., 2016, “ A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction ” IEEE Trans. Audio Speech Lang. Process.

  21. MI Approach Example Non-stationary (a) Noisy signal (b) MI Step Size Freq. Index 𝜈 Frame Index Freq. [KHz] = Q 0 

  22. MI Approach Example Non-stationary (a) Clean signal (c) Enhanced signal - MI (b) Noisy signal Freq. Index Freq. Index Freq. Index Frame Index Frame Index Frame Index Q ignored 

  23. Non-Stationary noise – filter output estimate Input 𝑧 𝑜 Output Signal 𝑦(𝑜) 𝑓(𝑜) ? = 𝑦 𝑜 + 𝑤(𝑜) + 𝑦 (𝑜) ො Source − ? 𝑨 (𝑜) = ො 𝑤(𝑜) 𝑤(𝑜) Noise −  Adaptive filter z Source Adaptive algorithm

  24. Experimental Setup ▪ Clean speech: 20 different speech signals from different speakers from TIMIT database (0.5M/0.5F) ▪ Sampled @ 16KHz ▪ SNR range [0,20] dB ▪ STFT, overlap-add ▪ Noise: 26 different non-stationary harmonic noise signals, e.g., heart monitor beeping, train door beeping, house alarm, railroad crossing bells.

  25. Correlation 𝛅 𝑌 𝑙, 𝑛, 𝜐 = 𝐹 𝑌 𝑙, 𝑛 𝐲 ∗ 𝑙, 𝑛 − 𝜐 2 𝐹 𝑌 𝑙, 𝑛 𝛅 V 𝑙, 𝑛, 𝜐 = 𝐹 𝑊 𝑙, 𝑛 𝐰 ∗ 𝑙, 𝑛 − 𝜐 2 𝐹 𝑊 𝑙, 𝑛 = ො 𝑤(𝑜) 1 Frame = 32ms [Frames]

  26. Proposed Approach ▪ Combined filter (CMLNLMS): 2 𝑏𝑜𝑒 𝐹 𝑐 𝑙, 𝑛 + 𝑀 2 ≤ 𝐹 𝑔 𝑙, 𝑛 2 ≤ 𝑍 𝑙, 𝑛 2 𝐹 𝑐 𝑙, 𝑛 + 𝑀 , 𝐹 𝑐 𝑙, 𝑛 + 𝑀 2 𝑏𝑜𝑒 𝐹 𝑔 𝑙, 𝑛 + 𝑀 2 ≤ 𝑍 𝑙, 𝑛 𝐹 𝑑 𝑙, 𝑛 = 2 > 𝐹 𝑔 𝑙, 𝑛 2 𝐹 𝑔 𝑙, 𝑛 , 𝐹 𝑐 𝑙, 𝑛 + 𝑀 𝑍 𝑙, 𝑛 , 𝑓𝑚𝑡𝑓 C F B

  27. Proposed Approach ▪ Harmonic noise presence detector for better speech preservation 𝐽 𝑙, 𝑛 = ቊ1 𝑊 𝑙, 𝑛 ∈ ℋ 0 0 𝑊 𝑙, 𝑛 ∈ ℋ 1 ▪ Set of filters with changing length, until maximal filter length L, based on the available amount of noise samples C C B F

  28. Performance Measures ▪ Distortion Index 𝑤 𝑡𝑒 ▪ Noise reduction Factor 𝜊 𝑜𝑠 ▪ Perceptual Evaluation of Speech Quality (PESQ) ITU-T P.862.2 ▪ The Short-Time Objective Intelligibility (STOI)

  29. Transient Reduction Better noise reduction which leads to improved 𝜊 𝑜𝑠 , PESQ, and STOI NRR [dB] levels for the combined filter  = = 3 , L 3  = 0 . 5 − 25 dB indicator threshold Frame Index

  30. Step Size ▪ An appropriate selection of the step size is required NRR [dB] 𝑤 𝑡𝑒 𝜐 [Frames] Frame Index  = ▪ Fixed step size 0 . 5   ?   ▪ max MI , const

  31. 𝑤 𝑡𝑒 𝑒𝐶 𝜊 𝑜𝑠 𝑒𝐶 𝜐 [Frames] 𝜐 [Frames] Combined & MI-Combined show better results than MI STOI PESQ  = 0 . 5 𝜐 [Frames] 𝜐 [Frames] − 25 dB indicator threshold

Recommend


More recommend