multimedia communications
play

Multimedia Communications Spring 2006-07 Voice Traffic - PowerPoint PPT Presentation

CS 584 / CMPE 584 Multimedia Communications Spring 2006-07 Voice Traffic Characteristics Shahab Baqai LUMS Voice Communication Characteristics Speech produces a signal that varies slowly in time 4 kHz bandwidth 2 Voice Coding


  1. CS 584 / CMPE 584 Multimedia Communications Spring 2006-07 Voice Traffic Characteristics Shahab Baqai LUMS

  2. Voice Communication Characteristics � Speech produces a signal that varies slowly in time � 4 kHz bandwidth 2

  3. Voice Coding � Voice processing comprises of two steps � Speech analysis � Converts an analogue voice signal to digital form � Speech Synthesis � Converts a digital voice data into its analogue form � Two Methods used for voice processing � Waveform coding � Pulse Code Modulation (PCM) � Code-excited Linear Prediction Coding (CELP) � Vocoding 3

  4. PCM � Signal is sampled at regular intervals � Sampling rate = 8 kHz (Nyquist Rate) � Samples are quantized and transmitted � 8 bits/sample ⇒ 64 kbps 4

  5. 5 Sampling and Quantization

  6. Voice Quality Measure � Quantization is a source for degradation (noise) � May be measured by X N k ∑ ∫ ( ) 2 x p x dx σ 2 = k 1 = = X − x SNR 1 k σ X 2 N k ∑ ∫ ( ) ( ) − 2 q x y p x dx k = k 1 X − k 1 Where ( ) � is the probability density function of the signal p x � X is the decision level k 6

  7. Uniform Quantizer � Interval between consecutive decision levels is constant � − = Δ (constant) X − X k k 1 � Problem � SNR is not constant � Depends on amplitude � The soft speaker is penalized more than a loud speaker 7

  8. � A-Law (Europe) 8 � μ -Law (North America) Non Uniform Quantizer

  9. Adaptive Differential PCM � Takes advantage of the slow rate of change in the voice signal: – Quantizes and transmits the difference between consecutive samples – May use linear prediction of the signal 9

  10. CELP (Code-excited Linear Prediction) Coding � Coder – Voice is analyzed in frames of 10~30 ms represented by: � Synthesis filter • Updated by linear prediction � Excitation • Optimally selected so as to minimize a “perceptually” weighted measure of distortion • Makes use of a codebook – A data frame is produced & transmitted � Decoder Excitation Signal LP filter Reproduced Waveform 10

  11. VoCoding � For very low bit rates ( ≅ 2 kbps) � Based on modeling the speech production mechanisms rather than the waveform – Speech is processed in frames of 10~25 ms – Distinction between voiced & unvoiced frames � Voiced speech: vocal cords vibrating (e.g. vowels) � Unvoiced speech: vocal cords held firm w/o vibration (e.g. consonants) � Speech is represented by – Coefficients that define vocal tract resonance characteristics – Excitation energy – Pitch value 11

  12. VoCoding (cont) � Low quality – Unnatural, buzzy � Works only for human speech – Not optimized for other audio signals � Little current interest – No international standard yet 12

  13. Motivating Voice Compression – MOS: Mean Opinion Score – subjective measure of voice quality – CELP: Code-excited Linear Prediction – LD: Low Delay – CS-ACELP: Conjugate Structure – Algebraic CELP – MP-MLQ: Multi-Pulse Excitation with a Maximum Likelihood Quantizer 13

  14. � Speech alternates between two states 14 Speech Activity – Talk spurt – Silence

  15. Speech Activity (cont) Silence Talkspurt Avg Time ≈ 1.2 sec Avg Time ≈ 1.8 sec – One speaker talking : 64 ~ 73 % – Both speakers talking: 3 ~ 7 % – Both speakers silent: 33 ~ 20 % 15

  16. Silence Suppression � Voice Activity Detector (VAD) – When silence is detected, background noise is transmitted – When speech is detected, full fixed bit rate stream is transmitted � About 60% reduction in data rate – Resulting traffic is no longer constant bit rate – Statistical Multiplexing gain may be significant 16

Recommend


More recommend