Platzhalter für Bild, Bild auf Titelfolie hinter das Logo einsetzen Artificial Bandwidth Extension Using Deep Neural Networks for Spectral Envelope Estimation Johannes Abel and Tim Fingscheidt Institute for Communications Technology, Technische Universität Braunschweig
We Need More Acoustical Bandwidth! Problem: Speech quality and intelligibility suffers from limited acoustical bandwidth Conventional narrowband (NB) telephony call (acoustic bandwidth: 0.3<f<3.4 kHz) Speech quality: 3.2/5.0 Mean opinion score (MOS) points Intelligibility: 90% (Consonant-vowel-consonant test) Wideband (WB) telephony call with acoustic bandwidth of 0.05<f<7 kHz Speech quality: 4.5/5.0 MOS points Intelligibility: 98% Problem solved ? [Data taken from: Krebber, “Sprachübertragungsqualität von Fernsprech-Handapparaten”, VDI-Fortschrittsberichte, 1995 and Terhardt, “Akustische Kommunikation”, Springer, 1998] 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 2/16
We Need More Acoustical Bandwidth! Requirements for a WB call: 1. WB-capable mobile handsets (far-end and near-end) 2. All participants of a call need to be located within a WB-capable cell 3. The provider’s backbone network must be WB-capable 4. Further requirements for international WB calls and also for inter-operator connections If the many requirements are not met at the beginning of a call, only NB mode is possible. If requirements during a call are not met anymore, the call drops to NB mode. Typically, switching back to WB mode if requirements are met again is then disabled. Solution: Artificial Bandwidth Extension (ABE) Estimation of frequency components from 4 to 7 kHz, a.k.a. the upper band (UB), at the receiver-side for a more consistent and WB-like experience. 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 3/16
Outline 1. Motivation 2. ABE Framework Overview Statistical Models Baseline: HMM/GMM DNN and HMM/DNN 3. Simulations 4. Summary 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 4/16
NB sample idx Power spectral density WB sample idx LP filter coef. 2. ABE Framework Frame index Sampling frequencies UB Spectral Envelope Estimation . WB PSD Assembly . NB PSD Computation WB LP VAD Analysis estimated UB speech LP Analysis LP Synthesis ↑ 2 Filtering Filtering narrowband wideband input speech output speech 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 5/16
Feature vec. Codebook entry idx 2. ABE Framework A posteriori prob. Est. UB cepstral vec. UB Spectral Envelope Classification Codebook entry UB Spectral Envelope Estimation UB Envelope Codebook Feature Statistical Spectral Extraction Model Conversion „UB energy“ 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 6/16
: State prob. 2. ABE Framework : Transition prob. Statistical Model: HMM/GMM (Baseline) : Likelihood HMM Param. LDA Matrix GMM Param. LDA Forward GMM Transform Algorithm HMM/GMM Linear discriminant analysis Forward algorithm (LDA) for dimension reduction for HMM evaluation of features GMM as acoustic model 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 7/16
: Network weights 2. ABE Framework : Network offsets Statistical Model: HMM/DNN (new) HMM Param. DNN Param. Prior Forward DNN Division Algorithm HMM/DNN Deep neural network (DNN) Forward algorithm as acoustic model for HMM evaluation Posterior outputs from DNN are recalculated to likelihoods 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 8/16
2. ABE Framework Statistical Model: DNN (new) DNN Param. DNN DNN DNN as statistical model 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 9/16
Outline 1. Motivation 2. ABE Framework Overview Statistical Models Baseline: HMM/GMM DNN and HMM/DNN 3. Simulations 4. Summary 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 10/16
3. Simulations Experimental Setup DNN Experiments Initial weights for DNN training from restricted Boltzmann machine (RBM) pretraining DNN topologies under test: Number of hidden layers: 1, 2, 3, 4, 5, 6 Number of units per layer: 512 Datasets Step Speech Database Codebook, RBM pretraining, HMM/DNN/GMM training TIMIT Train Set DNN validation checks TIMIT Test Set Result reporting NTT-AT Database (EN+DE) Cepstral Distances for… …estimated UB envelope: …estimated UB energy ratio: 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 11/16
3. Simulations Results – Cepstral Distances #Hidden Layer(s) [dB] [dB] #Units DNN/ DNN/ DNN DNN HMM HMM 1 5.34 5.34 7.13 7.16 2 5.41 5.45 7.23 7.23 DNN topology has UB energy cepstral 3 5.38 5.40 6.97 6.92 512 only small influence distance decreased by 4 5.44 5.50 7.13 7.09 on evaluation metrics more than 2 dB 5 5.40 5.44 7.12 7.04 (improvement!) 6 5.39 5.42 7.05 6.99 5.31 9.12 HMM/GMM Still big potential for Oracle 4.44 1.95 further improvement UB envelope reconstruction very similar in all cases, small potential for further improvement 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 12/16
3. Simulations Results – Speech Quality (WB-PESQ) Statistical Model MOS LQO (Baseline) 2.73 0.35 MOS LQO points HMM/GMM improvement! [3.05,3.08] DNN [2.99,3.02] HMM/DNN Gap to oracle less than Oracle 3.26 0.2 MOS LQO points 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 13/16
3. Simulations Latest ABE Approach and CCR-Test UB Spectral Envelope DNN++ Estimation Feature Spectral DNN Extraction Conversion CCR Condition CMOS AMR vs. AMR-WB 2.15 HMM/GM M vs. AMR-WB 1.48 HMM/GMM DNN+ vs. AMR-WB 1.31 DNN++ HMM/GMM vs. DNN++ 0.13 HMM/GMM DNN++ AMR vs. 0.81 HMM/GMM AMR vs. DNN++ 1.37 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 14/16
Outline 1. Motivation 2. ABE Framework Overview Statistical Models Baseline: HMM/GMM DNN and HMM/DNN 3. Simulations 4. Summary 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 15/16
4. Summary DNNs outperform GMMs as acoustic model for artificial bandwidth extension Using DNNs led to an improvement of up to 0.35 MOS LQO points when ABE-processed speech is evaluated using WB-PESQ A superior UB energy estimation is responsible for the speech quality gain, rather than the UB envelope The UB spectral envelope estimation performance of DNNs is similar compared to GMMs Huge potential for further improvement of UB energy estimate Superiority of using DNNs in ABE was proven by a clear 1.37 CMOS points advantage over AMR-coded narrowband speech 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 16/16
Thank you for your attention Johannes Abel abel@ifn.ing.tu-bs.de 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 17/16
2. ABE Framework UB Envelope Codebook if frame contains an /s/ or /z/ sound else Speech Data prediction gain UB SLP Relative Analysis energy ratio prediction gain NB LBG Clustering 16 entries calculated from with UB Envelope Codebook 8 entries calculated from with P. Bauer and T. Fingscheidt, “A Statistical Framework for Artificial Bandwidth Extension Exploiting Speech Waveform and Phonetic Transcription,” in Proc. of EUSIPCO, Glasgow, Scotland, Aug. 2009, pp. 1839–1843. 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 18/16
3. Simulations Results – Phoneme Accuracy Relative classification accuracy of vs. for phonemes HMM/DNN HMM/GMM (measured on validation set) Phoneme /f/ /th/ /dh/ /t/ /zh/ … /s/ +83 +59 +56 +54 +52 … +8 4 of 5 phonemes that profit most are fricative sounds All phonemes take profit from DNN as acoustic model 10.05.17 | J. Abel | ABE using DNNs for Spectral Envelope Estimation | 19/16
Recommend
More recommend