A Priori SNR Estimation Using Weibull Mixture Model 12. ITG Fachtagung Sprachkommunikation Aleksej Chinaev, Jens Heitkaemper, Reinhold Haeb-Umbach Department of Communications Engineering Paderborn University 7. Oktober 2016 Computer Science, Electrical NT Engineering and Mathematics Communications Engineering Prof. Dr.-Ing. Reinhold Häb-Umbach
Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model Experimental evaluation 3 Conclusions and outlook 4 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10
Problem formulation and motivation Single-channel clean speech s ( t ) contaminated by an additive noise n ( t ) : STFT y ( t ) = s ( t ) + n ( t ) ◦ ——- • Y ( k , ℓ ) = S ( k , ℓ ) + N ( k , ℓ ) ˆ λ N ( k , ℓ ) − noise power spectral density (PSD) k - frequency bin Noise PSD ℓ - frame index tracker | Y ( k , ℓ ) | 2 ˆ ˆ Y ( k , ℓ ) ξ ( k , ℓ ) G ( k , ℓ ) S ( k , ℓ ) ˆ s ( t ) A priori SNR Gain | · | 2 • • ISTFT estimator function A priori SNR ξ ( k , ℓ ) = λ S ( k ,ℓ ) λ N ( k ,ℓ ) – a key component in enhancement system | S ( k , ℓ ) | 2 � | N ( k , ℓ ) | 2 � λ S ( k , ℓ ) = E � - clean speech PSD, λ N ( k , ℓ ) = E � - noise PSD Motivated by a generalized spectral subtraction (GSS) denoising | Y ( k , ℓ ) | α for α ∈ R > 0 not restricted to ( α = 1) or ( α = 2) with assumption | Y ( k , ℓ ) | α = | S ( k , ℓ ) | α + | N ( k , ℓ ) | α NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10
Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model Experimental evaluation 3 Conclusions and outlook 4 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 1 / 10
Normalized α -order magnitude (NAOM) domain A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate P S α ( k ) Estimate Estimate ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate parameter of and go into clean speech a priori SNR WMM p S α ( s ) NAOM domain NAOMs ˆ λ N α ( k , ℓ ) π m ( k , ℓ ) λ N ( k , ℓ ) Normalize | Y ( k , ℓ ) | α to a root of an averaged power P S α ( k ) of | S ( k , ℓ ) | α L Y α ( k , ℓ ) = | Y ( k , ℓ ) | α P S α ( k ) = 1 � | S ( k , ℓ ) | 2 α = S α ( k , ℓ )+ N α ( k , ℓ ) with � L P S α ( k ) ℓ = 1 Statistical models independent of speaker loudness Normalized energy of clean speech NAOMs E [ S 2 α ( k )] = 1 S α ( k , ℓ ) & N α ( k , ℓ ) – realizations of random variables S α ( k ) & N α ( k ) Estimate S α ( k , ℓ ) from Y α ( k , ℓ ) given models for S α ( k ) & N α ( k ) NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 2 / 10
Modeling of noise NAOM coefficients N α ( k , ℓ ) N ( k , ℓ ) ∼ N c ( n ; 0 , λ N ( k , ℓ )) Weibull PDF for λ = 1 and different α 0 . 5 N α ( k , ℓ ) – Weibull distributed Weib ( n ; 1 , α ) 1 p N α ( k ,ℓ ) ( n ) = Weib ( n ; λ N α ( k , ℓ ) , α ) 1 1 . 5 2 Shape parameter α ∈ R > 0 Scale parameter λ N ( k , ℓ ) λ N α ( k , ℓ ) = ∈ R > 0 0 � 0.5 1.5 2 P S α ( k ) α n Histogram and Weibull PDF for α = 0 . 7 Model N α ( k ) with Weibull PDF Noise NAOMs 3 p N α ( k ) ( n ) = Weib ( n ; λ N α ( k ) , α ) Weibull PDF L p N α (n) 2 with λ N α ( k ) = 1 � λ N α ( k , ℓ ) L ℓ = 1 1 NAOM coefficients of white noise 0 signal and estimated p N α ( k ) ( n ) 0 0.3 0.6 0.9 n NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 3 / 10
Modeling of NAOM coefficients of clean speech S α ( k , ℓ ) Histogram and estimated WMM S ( k , ℓ ) ∼ N c ( n ; 0 , λ S ( k , ℓ )) 10 Bimodal Weibull mixture model Clean speech NAOMs (WMM) to model S α ( k ) Bimodal WMM 2 � m = 1 component p S α ( k ) ( s ) = π m ( k ) · Weib ( s ; λ m ( k ) , β ) m = 2 component m = 1 m = 1 : silence p S α (s) 1 m = 2 : activity π m ( k ) ∈ [ 0 , 1 ] : weights λ m ( k ) : scale parameters β : shape parameter β � = α : additional degree of freedom in the model 0.1 Clean speech NAOMs & estimated 0 0.5 1.0 1.5 WMM ( α = 0 . 7 ; β = 2 . 5) s NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 4 / 10
Estimation of WMM parameters and clean speech NAOMs A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate Estimate Estimate P S α ( k ) ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate parameter of and go into clean speech a priori SNR NAOM domain WMM p S α ( s ) NAOMs ˆ λ N α ( k , ℓ ) π m ( k , ℓ ) λ N ( k , ℓ ) Set λ 1 ( k ) acc. to ξ min usually used in a priori SNR estimation [Cappe 94] Expectation Maximization algorithm to estimate λ 2 ( k ) , π m ( k ) After EM, weights π m ( k ) are corrected with the constraint E [ S 2 α ( k )] = 1 A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate P S α ( k ) Estimate Estimate ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate parameter of and go into clean speech a priori SNR WMM p S α ( s ) NAOM domain NAOMs ˆ λ N α ( k , ℓ ) π m ( k , ℓ ) λ N ( k , ℓ ) Maximum a posteriori (MAP) estimation: ˆ S MAP ( k , ℓ ) = argmax p S α ( k ) | Y α ( k ,ℓ ) ( s | y ) α s Y α ( k , ℓ ) is a realisation of random variable Y α ( k ) = S α ( k ) + N α ( k ) Approximative computationally efficient solution for β = α = 1 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 5 / 10
Calculation of a priori SNR and causal implementation A priori SNR estimator | Y ( k , ℓ ) | 2 Y α ( k , ℓ ) λ m ( k , ℓ ) Estimate Estimate P S α ( k ) Estimate ˆ ˆ S α ( k , ℓ ) ξ ( k , ℓ ) Calculate and go into parameter of clean speech a priori SNR WMM p S α ( s ) NAOM domain NAOMs ˆ λ N ( k , ℓ ) λ N α ( k , ℓ ) π m ( k , ℓ ) Go back into domain of power spectral density by calculating � 2 � ˆ � α S α ( k , ℓ ) · P S α ( k ) ˆ ξ ( k , ℓ ) = max , ξ min λ N ( k , ℓ ) Causal implementation of WMM-based a priori SNR estimators Calculate P S α ( k ) and λ N α ( k ) in a causal way Causal EM for λ 2 ( k ) and π 2 ( k ) with one EM-iteration per time frame Note, parameters α and β have to be set appropriately → optimization NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 6 / 10
Table of contents 1 Problem formulation and motivation 2 A priori SNR estimation based on Weibull mixture model Experimental evaluation 3 Conclusions and outlook 4 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 6 / 10
Experimental evaluation Data and setup Clean speech: Wall Street Journal database 16 kHz (male and female) 7 different noise types of Noisex92 database: white , pink , f16 , hfchannel , factory-1 , factory-2 , babble Input global SNR from − 5 dB up to 25 dB in 5 dB steps Spectral speech enhancement framework Noise PSD tracking using Minimum statistics approach [Martin 01] A priori SNR estimation with ξ min = − 18 dB [Cappe 94] Proposed WMM-based approach with Wiener filter Reference approach: Decision Directed [Ephraim 84] NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 7 / 10
Optimization of α and β Speech quality maximization in terms of wide-band mean opinion score listening quality objective (MOS-LQO) with ∆ MOS-LQO = max ( MOS-LQO WMM − MOS-LQO DD , 0 ) Averaging over genders, noise types and input global SNR values ( α opt , β opt ) = ( 0 . 64 , 2 . 7 ) ∆ MOS-LQO 0.1 0 4 2 1 0 . 8 β 0 . 6 0 . 4 α NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 8 / 10
Final experimental results Clean speech: WSJ database signals other than used for optimization Estimation error – Itakura-Saito distance (ISD) and estimator’s variance – logarithmic error variance (LEV): the smaller the better Resulting ISD, LEV and MOS-LQO values averaged over noise types SNR, dB − 5 0 5 10 15 20 25 AVG 34 . 4 DD 48 . 8 44 . 0 39 . 6 34 . 9 30 . 2 24 . 5 19 . 1 ISD WMM 30 . 6 42 . 6 38 . 1 34 . 1 30 . 4 27 . 3 23 . 0 18 . 9 DD 53 . 1 49 . 0 46 . 4 45 . 1 45 . 5 47 . 4 50 . 5 48 . 1 LEV WMM 45 . 6 43 . 9 42 . 6 41 . 1 39 . 0 37 . 0 35 . 9 40 . 7 2 . 16 DD 1 . 11 1 . 30 1 . 63 2 . 09 2 . 57 3 . 00 3 . 39 MOS-LQO WMM 1 . 18 1 . 46 1 . 77 2 . 13 2 . 62 3 . 16 3 . 61 2 . 28 NT A Priori SNR Estimation Using Weibull Mixture Model A. Chinaev, J. Heitkaemper, R. Haeb-Umbach 9 / 10
Recommend
More recommend