CS 584 / CMPE 584 Multimedia Communications Spring 2006-07 Voice Traffic Characteristics Shahab Baqai LUMS
Voice Communication Characteristics � Speech produces a signal that varies slowly in time � 4 kHz bandwidth 2
Voice Coding � Voice processing comprises of two steps � Speech analysis � Converts an analogue voice signal to digital form � Speech Synthesis � Converts a digital voice data into its analogue form � Two Methods used for voice processing � Waveform coding � Pulse Code Modulation (PCM) � Code-excited Linear Prediction Coding (CELP) � Vocoding 3
PCM � Signal is sampled at regular intervals � Sampling rate = 8 kHz (Nyquist Rate) � Samples are quantized and transmitted � 8 bits/sample ⇒ 64 kbps 4
5 Sampling and Quantization
Voice Quality Measure � Quantization is a source for degradation (noise) � May be measured by X N k ∑ ∫ ( ) 2 x p x dx σ 2 = k 1 = = X − x SNR 1 k σ X 2 N k ∑ ∫ ( ) ( ) − 2 q x y p x dx k = k 1 X − k 1 Where ( ) � is the probability density function of the signal p x � X is the decision level k 6
Uniform Quantizer � Interval between consecutive decision levels is constant � − = Δ (constant) X − X k k 1 � Problem � SNR is not constant � Depends on amplitude � The soft speaker is penalized more than a loud speaker 7
� A-Law (Europe) 8 � μ -Law (North America) Non Uniform Quantizer
Adaptive Differential PCM � Takes advantage of the slow rate of change in the voice signal: – Quantizes and transmits the difference between consecutive samples – May use linear prediction of the signal 9
CELP (Code-excited Linear Prediction) Coding � Coder – Voice is analyzed in frames of 10~30 ms represented by: � Synthesis filter • Updated by linear prediction � Excitation • Optimally selected so as to minimize a “perceptually” weighted measure of distortion • Makes use of a codebook – A data frame is produced & transmitted � Decoder Excitation Signal LP filter Reproduced Waveform 10
VoCoding � For very low bit rates ( ≅ 2 kbps) � Based on modeling the speech production mechanisms rather than the waveform – Speech is processed in frames of 10~25 ms – Distinction between voiced & unvoiced frames � Voiced speech: vocal cords vibrating (e.g. vowels) � Unvoiced speech: vocal cords held firm w/o vibration (e.g. consonants) � Speech is represented by – Coefficients that define vocal tract resonance characteristics – Excitation energy – Pitch value 11
VoCoding (cont) � Low quality – Unnatural, buzzy � Works only for human speech – Not optimized for other audio signals � Little current interest – No international standard yet 12
Motivating Voice Compression – MOS: Mean Opinion Score – subjective measure of voice quality – CELP: Code-excited Linear Prediction – LD: Low Delay – CS-ACELP: Conjugate Structure – Algebraic CELP – MP-MLQ: Multi-Pulse Excitation with a Maximum Likelihood Quantizer 13
� Speech alternates between two states 14 Speech Activity – Talk spurt – Silence
Speech Activity (cont) Silence Talkspurt Avg Time ≈ 1.2 sec Avg Time ≈ 1.8 sec – One speaker talking : 64 ~ 73 % – Both speakers talking: 3 ~ 7 % – Both speakers silent: 33 ~ 20 % 15
Silence Suppression � Voice Activity Detector (VAD) – When silence is detected, background noise is transmitted – When speech is detected, full fixed bit rate stream is transmitted � About 60% reduction in data rate – Resulting traffic is no longer constant bit rate – Statistical Multiplexing gain may be significant 16
Recommend
More recommend