Speech Enhancement Based on Adaptive Line Enhancer Research Thesis Aviva Atkins April 7 th , 2020 Supervised by Prof. Israel Cohen
Outline ▪ Introduction ▪ The problem researched ▪ The challenges ▪ Research contributions ▪ Adaptive Line Enhancer background ▪ Convention fixed step size ▪ Mutual Information approach ▪ Proposed method ▪ Conclusions and future research Sound test
Noise is Everywhere! Reverberation Echo Source Additive noise Interference
Speech Enhancement
Applications
Harmonic noise ▪ Contains deterministic sinusoidal components
The problem researched Reducing nonstationary harmonic noise from a speech signal recorded with a single microphone Source Nonstationary Harmonic Additive noise
The challenges ▪ Single channel – only the noisy signal is available with no access to additional reference signals and no spatial information → only intrinsic properties of speech or noise can be used ▪ The vast majority of methods require an estimate of the noise spectrum ▪ When the noise is stationary it can be estimated during segments when speech is absent ▪ When the noise is nonstationary it needs to be tracked continuously → it is more difficult to estimate nonstationary noise ▪ Trade-off between noise reduction to speech distortion ▪ The developed method needs to be relevant for real-time applications
Research Contributions ▪ Introduced a filtering method based on the frequency domain Adaptive Line Enhancer, that enables better reduction of nonstationary harmonic noise. ▪ Proposed the combined filter – a combination of the commonly-used forward adaptive linear filter and a non-causal backward adaptive linear filter used together, increasing the reduction span of the noise transient ▪ Applied the filter based on a comparison to the noisy spectrum, reducing noise overestimation ▪ Applied the filter based on a noise presence indicator for better speech preservation ▪ Employed a set of filter lengths, to ensure the combined filter spans throughout the noise transient
Additional contributions ▪ Investigated a statistical model as an alternative to the Decision Directed for the a-priori SNR estimator and showed that it can eliminate the musical noise while compromising between signal distortion and noise reduction. ▪ Introduced a beamformer that enables fine tuning of the compromise between Directivity Factor and White Noise Gain, through a simple computationally- efficient algorithm.
Why use Adaptive Line Enhancer? ▪ Exploits the structure of the harmonic noise ▪ Simple with low computational cost ▪ Modifies both magnitude and phase so has the potential to improve on signal intelligibility and not just quality
Adaptive Noise Canceller (ANC) Primary Input Signal Output +𝑒(𝑜) 𝑦 𝑜 + 𝑤(𝑜) 𝑓(𝑜) + Source 𝑦 (𝑜) ො +𝑒(𝑜) − 𝑤 (𝑜) ො 𝑤 0 (𝑜) distortion Noise Adaptive filter Source +𝑦 0 (𝑜) Adaptive Reference algorithm ? Input 𝑦 = 𝑦 + 𝑤 − ො ො 𝑤 𝑦 2 = 𝐹 𝑦 2 + 𝑛𝑗𝑜𝐹 𝑤 2 𝑛𝑗𝑜𝐹 ො 𝑤 − ො 𝑦 2 = 𝑛𝑗𝑜𝐹 𝑤 2 𝑛𝑗𝑜𝐹 𝑦 − ො 𝑤 − ො 𝑤 = 𝑤, ො ො 𝑦 = 𝑦 Ideal case:
Adaptive Line Enhancer (ALE) Input 𝑧 𝑜 Output Signal decorrelated Signal 𝑦(𝑜) 𝑓(𝑜) = 𝑦 𝑜 + 𝑤(𝑜) + 𝑦 (𝑜) ො Noise correlated Source − Output ’ 𝑨 (𝑜) 𝑤(𝑜) Noise decorrelated Noise − 𝑦 (𝑜) ො Adaptive filter z signal correlated Source Adaptive algorithm
Adaptive Line Enhancer (ALE) TD FD Input 𝑍 𝑙, 𝑛 𝑧 𝑜 Output X(𝑙, 𝑛) Signal 𝑦(𝑜) 𝐹 𝑙, 𝑛 𝑓(𝑜) = 𝑦 𝑜 + 𝑤(𝑜) = 𝑌 𝑙, 𝑛 + 𝑊 𝑙, 𝑛 + 𝑦 (𝑜) ො 𝑌 𝑙, 𝑛 Source − 𝑨 (𝑜) 𝑎 𝑙, 𝑛 𝑤(𝑜) 𝑊 𝑙, 𝑛 Noise − Adaptive filter z Source Adaptive algorithm
Adaptive Line Enhancer (ALE) FD Input 𝑍 𝑙, 𝑛 Output X(𝑙, 𝑛) Signal 𝐹 𝑙, 𝑛 = 𝑌 𝑙, 𝑛 + 𝑊 𝑙, 𝑛 + 𝑌 𝑙, 𝑛 Source − 𝑎 𝑙, 𝑛 𝑊 𝑙, 𝑛 Noise − Adaptive filter z 𝜐 Source Adaptive algorithm 𝜈 ( ) ( ) ( ) = T h k , m H k , m ,..., H k , m − 0 L 1 ( ) ( ) ( ) ( ) ( ) − − = − − − + * y T ( ) ( ) E k , m k , m y k , m Y k , m ,..., Y k , m L 1 + = + h h k , m 1 k , m NLMS: ( ) − 2 + ( ) ( ) ( ) y k , m = − H h y Z k , m k , m k , m
Conventional Fixed Step Size Example (a) Clean signal (b) Noisy signal (c) Enhanced signal Freq. Index Freq. Index Freq. Index Frame Index Frame Index Frame Index For the conventional fixed step size, it is difficult to both reduce the noise and maintain high quality of the enhanced signal = = 1 , 3 L
Mutual Information Approach Taghia, J., Martin, R., 2016, “ A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction ” IEEE Trans. Audio Speech Lang. Process. ▪ Frequency dependent step size, detecting harmonic noise presence per frequency ▪ Based on Mutual Information (MI) ▪ Step size: ( ) ( ) = k Q k 0 ( ) ( ) ˆ K ( ) P * ( ) I k , k k 2 P = 1 , if I k I = k ( ) thr Q ˆ constant total * = I k , k k k 1 0 0 , else
MI Approach Example (b) Noisy signal (c) MI Step Size Freq. Index 𝜈 Frame Index Freq. [KHz] = Q 1
MI Approach Example (a) Clean signal (b) Enhanced signal – fixed step size (c) Enhanced signal - MI Freq. Index Freq. Index Freq. Index Frame Index Frame Index Frame Index
MI Approach ▪ Implemented in block-wise manner ▪ Assumption: stationarity of the noise is at least as large as the block length ▪ They take block length of 3 seconds The assumption does not hold for highly non- stationary signals, such as the heart monitor beeping Decision block often zero for highly non-stationary signals, such as the heart monitor beeping Spectrogram of 3.4s long heart monitor beeping Taghia, J., Martin, R., 2016, “ A frequency-domain adaptive line enhancer with step-size control based on mutual information for harmonic noise reduction ” IEEE Trans. Audio Speech Lang. Process.
MI Approach Example Non-stationary (a) Noisy signal (b) MI Step Size Freq. Index 𝜈 Frame Index Freq. [KHz] = Q 0
MI Approach Example Non-stationary (a) Clean signal (c) Enhanced signal - MI (b) Noisy signal Freq. Index Freq. Index Freq. Index Frame Index Frame Index Frame Index Q ignored
Non-Stationary noise – filter output estimate Input 𝑧 𝑜 Output Signal 𝑦(𝑜) 𝑓(𝑜) ? = 𝑦 𝑜 + 𝑤(𝑜) + 𝑦 (𝑜) ො Source − ? 𝑨 (𝑜) = ො 𝑤(𝑜) 𝑤(𝑜) Noise − Adaptive filter z Source Adaptive algorithm
Experimental Setup ▪ Clean speech: 20 different speech signals from different speakers from TIMIT database (0.5M/0.5F) ▪ Sampled @ 16KHz ▪ SNR range [0,20] dB ▪ STFT, overlap-add ▪ Noise: 26 different non-stationary harmonic noise signals, e.g., heart monitor beeping, train door beeping, house alarm, railroad crossing bells.
Correlation 𝛅 𝑌 𝑙, 𝑛, 𝜐 = 𝐹 𝑌 𝑙, 𝑛 𝐲 ∗ 𝑙, 𝑛 − 𝜐 2 𝐹 𝑌 𝑙, 𝑛 𝛅 V 𝑙, 𝑛, 𝜐 = 𝐹 𝑊 𝑙, 𝑛 𝐰 ∗ 𝑙, 𝑛 − 𝜐 2 𝐹 𝑊 𝑙, 𝑛 = ො 𝑤(𝑜) 1 Frame = 32ms [Frames]
Proposed Approach ▪ Combined filter (CMLNLMS): 2 𝑏𝑜𝑒 𝐹 𝑐 𝑙, 𝑛 + 𝑀 2 ≤ 𝐹 𝑔 𝑙, 𝑛 2 ≤ 𝑍 𝑙, 𝑛 2 𝐹 𝑐 𝑙, 𝑛 + 𝑀 , 𝐹 𝑐 𝑙, 𝑛 + 𝑀 2 𝑏𝑜𝑒 𝐹 𝑔 𝑙, 𝑛 + 𝑀 2 ≤ 𝑍 𝑙, 𝑛 𝐹 𝑑 𝑙, 𝑛 = 2 > 𝐹 𝑔 𝑙, 𝑛 2 𝐹 𝑔 𝑙, 𝑛 , 𝐹 𝑐 𝑙, 𝑛 + 𝑀 𝑍 𝑙, 𝑛 , 𝑓𝑚𝑡𝑓 C F B
Proposed Approach ▪ Harmonic noise presence detector for better speech preservation 𝐽 𝑙, 𝑛 = ቊ1 𝑊 𝑙, 𝑛 ∈ ℋ 0 0 𝑊 𝑙, 𝑛 ∈ ℋ 1 ▪ Set of filters with changing length, until maximal filter length L, based on the available amount of noise samples C C B F
Performance Measures ▪ Distortion Index 𝑤 𝑡𝑒 ▪ Noise reduction Factor 𝜊 𝑜𝑠 ▪ Perceptual Evaluation of Speech Quality (PESQ) ITU-T P.862.2 ▪ The Short-Time Objective Intelligibility (STOI)
Transient Reduction Better noise reduction which leads to improved 𝜊 𝑜𝑠 , PESQ, and STOI NRR [dB] levels for the combined filter = = 3 , L 3 = 0 . 5 − 25 dB indicator threshold Frame Index
Step Size ▪ An appropriate selection of the step size is required NRR [dB] 𝑤 𝑡𝑒 𝜐 [Frames] Frame Index = ▪ Fixed step size 0 . 5 ? ▪ max MI , const
𝑤 𝑡𝑒 𝑒𝐶 𝜊 𝑜𝑠 𝑒𝐶 𝜐 [Frames] 𝜐 [Frames] Combined & MI-Combined show better results than MI STOI PESQ = 0 . 5 𝜐 [Frames] 𝜐 [Frames] − 25 dB indicator threshold
Recommend
More recommend