bandwidth extension of narrowband speech for low bit rate
play

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband - PowerPoint PPT Presentation

IEEE Speech Coding Workshop Sept 1720, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding Workshop 2000


  1. IEEE Speech Coding Workshop Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1

  2. Outline • Problem statement • Proposed solution • System performance • Discussion Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 2

  3. Problem Statement • Telephone Band: 300 - 3400 Hz • AM Band: 50 - 7000 Hz • How to make sound like with 500 bits/sec? (G.729) • We need to recover information from both low and high-frequency bands Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 3

  4. Proposed Solution • 1) Do our best to recover the wideband information from narrowband speech • 2) Use coding for the information that cannot be recovered – Recovered information : 1 1 0 • Low-frequency band 1 0 0 • High-frequency excitation 9 0 – Coded information : 8 0 ) B 7 0 d ( • High-frequency spectral e d u t 6 0 i l p envelope m A 5 0 4 0 3 0 2 0 1 0 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 7 0 0 0 8 0 0 0 F r e q u e n c y ( H z ) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 4

  5. System Overview Low-frequency 50-300 Hz band regeneration 8 kHz 16 kHz Inverse narrowband wideband IRM 300-3400 Hz  2 band Filter High-frequency 3400-8000 Hz Side information regeneration band • Inverse IRM filter is optional – produces a flat response from 200-3500 Hz Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 5

  6. Low-Frequency Regeneration (1/2) • Assumptions : – Only pitch harmonics need to be recovered • In general, no more than two pitch harmonics below 200 Hz – Absolute phase is not perceptually relevant • Frequency of harmonics determined from pitch analysis • Amplitudes determined from feed-forward multi- layer perceptron (output in log domain) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 6

  7. Low-Frequency Regeneration (2/2) Low frequencies 1 st harmonic LP Low-frequency Scale  2 2 nd harmonic filter harmonic synthesis and sum Narrowband speech Pitch delay (1) Pitch Pitch gain (1) Multi-layer analysis (16) Perceptron Scale factors MFCC Cepstral coefficients calculation Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 7

  8. High-Frequency Extension • Excitation-filter model (16 ms frames) • Problem is separated in two parts – Excitation extension (no side information) – Spectral envelope coding (side information) Narrowband High- High Excitation 1 A ( z ) speech frequency B ( z ) extension pass band Spectral envelope LPC Extension analysis Side information (High-frequency spectral envelope) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 8

  9. Excitation Extension High Absolute Whitening Narrowband wideband  2 excitation value pass filter excitation 1 1 0 . 8 0 . 8 0 . 6 0 . 5 0 . 6 0 . 4 0 . 4 0 . 2 0 0 . 2 0 - 0 . 5 0 - 0 . 2 0 5 1 0 1 5 2 0 0 5 1 0 1 5 2 0 0 5 1 0 1 5 2 0 1 0 1 5 5 8 4 1 0 6 3 4 2 5 2 1 0 0 0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 9

  10. Spectral Envelope Coding • Spectral envelope calculated from the wideband LPC coefficients • Quantization of the 3000-8000 Hz range (40 points) – Log domain – 8-bit Vector Quantization (500 bits/s side information, using 16 ms frames) • Concatenation with envelope obtained from LPC analysis on narrowband speech Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 10

  11. Objective results • Low-frequency band – 3 dB RMS error on harmonic amplitude • High-frequency band – 3.6 dB RMS error on envelope – No objective measure for excitation extension (perceptually very close to original) Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 11

  12. Subjective Results female male Original wideband Recovered from original IRM-filtered speech Recovered from G.729 coded speech Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 12

  13. Discussion • Highlights – Expand IRM-filtered telephone-band speech to AM band – Very low side information rate (500 bits/s) • Areas of improvement – Use high-band spectral estimation before coding – Use residual low-frequency information (below 300 Hz) – Noise robustness – Post-filtering Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 13

Recommend


More recommend