Maximilian Luz Speech Signal Processing and Speech Enhancement Summer Semester 2019 IMS Real-Time Capable Robust Noise Reduction
• Single microphone. • Real-time capable. • Adaptive to changes in noise/signal. • Processing in frequency-domain. • Unsupervised. 1/15 Goals
Basics Short-Time Fourier Transform (Weighted) Overlapp and Add Methods Spectral Subtraction MMSE and log-MMSE Robustifjcation Demonstration 2/15 Outline
Basics
M overlap N segment length … Segmentation DFT 3/15 Short-Time Fourier Transform x ( t ) Window Function h ( n ) x 1 ( t ) x 2 ( t ) x 3 ( t ) | X k ( f ) |
4/15 R hop size Sum Weighting IDFT (Weighted) Overlapp and Add | X k ( f ) | Window Function h ( n ) ˆ x 1 ( t ) ˆ x 2 ( t ) ˆ x 3 ( t ) ˆ x ( t )
Methods
5/15 Noise Estimation noisy signal • How to handle negative magnitude values after subtraction? signal • How to estimate noise? noise STFT ISTFT p Spectral Subtraction � D [ k ] � � ˆ � � � − Y [ k ] y ( t ) = x ( t ) + d ( t ) | . | p + arg Y [ k ] | . | 1 / p Open Questions:
6/15 0 • Too much subtraction leads to speech distortion. • Residual (musical) noise. Spectral Subtration 8 6 4 2 0 4 3 2 1 0 Original 1 2 3 4 0 2 4 6 8 Spectral Subtraction: Results Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ] Issues
7/15 x where Need to be estimated! ( a posteriori SNR) STFT ISTFT spectral noise power d G ( a priori SNR) d spectral signal power Gain Function X [ k ] = A [ k ] e j α [ k ] , Y [ k ] = R [ k ] e j ϑ [ k ] , y ( t ) = x ( t ) + d ( t ) Usually: G ξ [ k ] , γ [ k ] � � ∈ R ξ [ k ] := λ [ k ] Y [ k ] λ [ k ] R [ k ] � 2 � G [ k ] γ [ k ] := × λ [ k ]
8/15 ISTFT MMSE G d Noise Estimation STFT Minimum Mean-Square Error Spectral Amplitude Estimator (MMSE) Y [ k ] ξ Estimation Idea: Minimize ξ [ k ] ˆ �� A [ k ] � 2 � A [ k ] − ˆ E Solution: (Assumes Gaussian distribution) λ [ k ] A [ k ] = E � A [ k ] | Y [ k ] � γ [ k ] ˆ γ � ξ [ k ] , γ [ k ] � · R [ k ] = G MMSE × G [ k ]
9/15 • MMSE with difgerent penalization. log-MMSE G d Noise Estimation ISTFT STFT et al. 1980]. Minimum Mean-Square Error Log-Spectral Amplitude Estimator (log-MMSE) Idea: Minimize �� A [ k ] � 2 � log A [ k ] − log ˆ Y [ k ] E ξ Estimation ξ [ k ] ˆ Solution: (Assumes Gaussian distribution) A [ k ] = exp E � ln A [ k ] | Y [ k ] � ˆ λ [ k ] � ξ [ k ] , γ [ k ] � · R [ k ] = G log-MMSE γ [ k ] γ Notes: × G [ k ] • Better measure for speech [Gray
10/15 MMSE 1 2 3 4 0 2 4 6 8 0 Spectral Subtraction 1 2 3 4 0 2 4 6 8 log-MMSE 0 0 Original 0 6 8 0 4 3 2 1 1 2 2 3 4 0 2 4 6 8 4 MMSE and log-MMSE: Results Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ] Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ]
11/15 G G G H 1 bility Estimation Presence Proba- Conditional Speech bility Estimation Absence Proba- A Priori Speech 0 H 1 min 1 1 Incorporating Signal Presence Uncertainty (OM-LSA) [Cohen and Berdugo 2001] Idea: Two hypotheses 0 : Y [ k ] = D [ k ] H [ k ] ξ [ k ] : Y [ k ] = X [ k ] + D [ k ] H [ k ] q [ k ] ˆ p [ k ] := P � | Y [ k ] � H [ k ] γ [ k ] Solution: p [ k ] ˆ � ξ [ k ] , γ [ k ] � = G p [ k ] � ξ [ k ] , γ [ k ] � · G 1 − p [ k ] Estimate p k via Gaussian model and q [ k ] := P � � H [ k ] G [ k ]
P frame P global P local avg. over freq. globally avg. over freq. locally exp. avg. over time frame 12/15 Estimating the a priori Speech Absence Probability q [ k ] [Cohen and Berdugo 2001] ξ q [ k ] = 1 − P [ k ] local · P [ k ] global · P [ k ] ˆ
13/15 H 1 Decision-directed approach usually has less musical noise. Estimating the a priori SNR ξ [ k ] [Cohen and Berdugo 2001; Ephraim and Malah 1984] Maximum Likelihood: γ [ k , n − 1 ] + ( 1 − α ) γ [ k , n ] γ [ k , n ] = α ¯ ¯ 0 ≤ α ≤ 1 , β ≥ 1 , β ξ [ k , n ] = max � γ [ k , n ] − 1 , 0 � ˆ ¯ Decision-Directed: ξ [ k , n ] = α G 2 · γ [ k , n − 1 ] + ( 1 − α ) max γ [ k , n ] − 1 , 0 � ξ [ k , n − 1 ] , γ [ k , n − 1 ] � � � ˆ ˆ
14/15 d d exp. avg. over time speech indicator decision ratio localized minimum exp. avg. over time avg. over freq. d 2 p d d I S r S min Y S f S min L Minima controlled recursive averaging (MCRA): d d Adaptive and Robust Noise Estimation ( λ [ k ] d ) [Cohen and Berdugo 2001] S f / S min > δ ⇒ 1 ≤ δ ⇒ 0 � Y [ k , n ] � � � � λ [ k , n + 1 ] ˆ α [ k , n ] · ˆ λ [ k , n ] α [ k , n ] = ˜ + 1 − ˜ · � � � α [ k , n ] = α d + ( 1 − α d ) p [ k , n ] ˜
15/15 log-MMSE 1 2 3 4 0 2 4 6 8 0 MMSE 1 2 3 4 0 2 4 6 8 OM-LSA with MCRA 0 0 Original 0 6 8 0 4 3 2 1 1 2 2 3 4 0 2 4 6 8 4 Results Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ] Frequency [ kHz ] Frequency [ kHz ] Time [ s ] Time [ s ]
Demonstration
References Chen, Jingdong et al. (July 2006). “New insights into the noise reduction Wiener fjlter”. In: IEEE Transactions on Audio, Speech and Language Processing 14.4, pp. 1218–1234. Cohen, Israel and Baruch Berdugo (Nov. 2001). “Speech enhancement for non-stationary noise environments”. In: Signal Processing 81.11, pp. 2403–2418. Ephraim, Y. and D. Malah (Dec. 1984). “Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 32.6, pp. 1109–1121. — (Apr. 1985). “Speech enhancement using a minimum mean-square error log-spectral amplitude estimator”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 33.2, pp. 443–445. Gray, R. et al. (Aug. 1980). “Distortion measures for speech processing”. In: IEEE Transactions on Acoustics, Speech, and Signal Processing 28.4, pp. 367–376. Loizou, Philipos C. (Feb. 2013). Speech Enhancement . CRC Press.
Recommend
More recommend