Adaptive Loss Concealment Adaptive Loss Concealment for Internet Internet Telephony Telephony for Applications Applications Henning Sanneck GMD FOKUS, Berlin sanneck@fokus.gmd.de Supported by the USMInt (DFN) and Multicube (ACTS) projects
Overview Overview • Motivation • Receiver-Based Concealment • Adaptive Packetization / Concealment (Sender/Receiver operation) • Properties (packet sizes / header overhead, delay) • Subjective Test • Conclusions
Motivation: Loss of Speech Packets Loss of Speech Packets Motivation: • Congestion in the Internet / Mbone Packet Loss speech signal dropouts need to enhance speech quality • Solutions: bandwidth adaptation, resource reservation, differential services, redundancy/FEC, interleaving, receiver-based concealment
Packet Repetition Repetition (Receiver- Packet (Receiver-Based Based) ) + L/1000 0.2 p 0.15 L Sender 0.1 0.05 s(n) 0 −0.05 −0.1 −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4 n : sample number 0.2 0.15 Receiver 0.1 ~ 0.05 s(n) p(n) : pitch period 0 −0.05 −0.1 −0.15 n L : packet size −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 Receiver x 10 4 0.2 0.15 (concealed) 0.1 ^ 0.05 s(n) 0 −0.05 −0.1 −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4
Pitch Waveform Replication (Receiver- Pitch Waveform Replication (Receiver-Based Based) ) + L/1000 0.2 p 0.15 L Sender 0.1 0.05 s(n) 0 −0.05 −0.1 −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4 0.2 0.15 Receiver 0.1 ~ 0.05 s(n) p(n) : pitch period 0 −0.05 −0.1 −0.15 n L : packet size −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 Receiver x 10 4 0.2 0.15 (concealed) 0.1 ^ 0.05 s(n) 0 −0.05 −0.1 −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4
New approach approach New + p(c)/1000 0.2 p(c) 0.15 Sender 0.1 0.05 s(n) 0 −0.05 −0.1 L(c,c-1) L(c,c-1) −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4 0.2 0.15 c : „chunk“ number Receiver 0.1 ~ 0.05 s(n) p(c) : pitch period 0 −0.05 −0.1 −0.15 n L(c,c-1) : −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4 Receiver 0.2 0.15 packet size (concealed) 0.1 ^ 0.05 s(n) 0 −0.05 −0.1 −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4
Adaptive Packetization / / Concealment Concealment Adaptive Packetization Sender-supported concealment: choose packetization interval adaptively • packet size (size of lost segment) relates to „importance“ of packet content • pre-processing of the undistorted signal • enables simple concealment operation at the receiver (high probability that adjacent packets contents resemble each other)
Sender: Adaptive Packetization Adaptive Packetization Sender: • Auto-correlation of signal partitioning („chunks“) • speech content transition check: voiced/unvoiced • packetization: 2 chunks/packet (header overhead) (110 ms speech)
Packet Size Frequency Distribution Distribution Packet Size Frequency • Packet size is now dependent on speaker‘s pitch (range: p min = 30, 2p max = 320 samples; 4...40ms ) 0.05 0.04 relative frequency l n / L 0.03 0.02 0.01 0 0 50 100 90 150 l [samples] f S / p V [Hz] 100 200 110 120 250 130 140 300 150 160 pitch frequency [Hz] packet length l [samples] (Relative frequency n weighted with size l )
Relative Packet Header Packet Header Overhead Overhead Relative Typical value for IP Telephony: O = 20% • fixed packetization interval: 160 samples [20ms], • RTP/UDP/IP per packet overhead: o = 40 octets AP/C: Speaker Estimated Measured overhead overhead O [%] o/(o+2p v ) Male low 20.16 20.14 Male high 22.97 22.83 Female low 25.72 24.84 Female high 28.62 27.98 (mean pitch period: p v )
Receiver: Concealment Concealment Receiver: left packet lost packet right packet l p(c ) 21 boundary c 11 c 12 c 21 c 22 c 31 c 32 info k = p( ) / p( ) c 12 c 21 k = p( ) / p( ) c 31 c 22 0.1 0.1 0.08 0.08 0.06 0.06 resampling 0.04 0.04 k 0.02 0.02 0 0 −0.02 −0.02 −0.04 −0.04 −0.06 −0.06 −0.08 −0.08 −0.1 0 10 20 30 40 50 60 70 80 90 100 −0.1 0 10 20 30 40 50 60 70 80 90 100 left packet replacement packet right packet ^ ^ c 11 c 12 c c 22 c 31 c 32 21 resampling: no specific distortions introduced (like e.g. with PWR)
Receiver: Concealment Concealment ( (contd contd.) .) Receiver: + p(c)/1000 0.2 p(c) 0.15 Sender 0.1 0.05 s(n) 0 −0.05 −0.1 L(c,c-1) L(c,c-1) −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4 0.2 0.15 c : „chunk“ number Receiver 0.1 ~ 0.05 s(n) p(c) : pitch period 0 −0.05 −0.1 −0.15 n L(c,c-1) : −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4 Receiver 0.2 0.15 packet size (concealed) 0.1 ^ 0.05 s(n) 0 −0.05 −0.1 −0.15 n −0.2 3.68 3.69 3.7 3.71 3.72 3.73 3.74 3.75 3.76 3.77 3.78 x 10 4
Discussion Discussion • characteristic“ information: 2 octets (own and following intra-packet boundary) • additional delay (buffering): sender: [ p max ,2p max -p min ]= 20...36ms receiver: [ p min ,2p max ]= 4...40ms (on loss only) • computational complexity/processing delay: low • backwards compatible with existing tools
Subjective Test: Test Test: Test Procedure Procedure Subjective • Four signals (different speakers), PCM 16 bit linear, 8 kHz • comparison with „silence substitution“ and PWR • random, yet isolated packet losses • 40 test conditions: 4 speakers x (3 algorithms x 3 loss rates + original) • thirteen non-expert listeners judged on MOS scale • Anchoring: Original=5, „Worst Case“=1 (50% loss) • test conditions in rapid, random sequence
Subjective Test: Test: Results Results Subjective MOS: Silence Substitution MOS: Pitch Waveform Replication Silence Substitution Pitch Waveform Replication 5 5 5 5 4.5 4.5 4 4 3.5 3.5 3 3 MOS MOS 2.5 2.5 2 2 1.5 1.5 1 1 1 1 0 0 0 0 10 10 20 20 90 90 30 30 90 90 100 100 110 110 40 40 120 120 50 50 130 130 140 140 50 50 sample loss sample loss 150 150 160 f S / p V [Hz] 160 f S / p V [Hz] 60 60 170 170 170 170 pitch frequency [Hz] pitch frequency [Hz] sample loss rate [%] sample loss rate [%] rate [%] rate [%]
Subjective Test: Test: Results Results ( (contd contd.) .) Subjective MOS: Adaptive Packetization/ Standard deviation of MOS (AP/C) Concealment Adaptive Packetization / Concealment Adaptive Packetization / Concealment 5 5 1.5 4.5 1.5 4 3.5 3 MOS 1 std d ev of MOS 2.5 2 0.5 1.5 1 1 60 0 50 0 0 50 10 0 40 20 90 sample loss 30 100 90 110 30 90 90 100 20 120 110 130 40 120 50 140 10 130 150 140 50 160 sample loss 0 0 150 170 sample loss rate [%] rate [%] 160 f S / p V [Hz] f S / p V [Hz] 170 60 170 170 pitch frequency [Hz] pitch frequeny [Hz] sample loss rate [%] rate [%]
Conclusions Conclusions • Sender preprocessing (Adaptive Packetization) pre-defined parts of the signal are dropped less perceptible distortion, simple concealment • low overhead (data, delay, processing, deployment) • Future/ongoing work: frame-based codec support / integration complement end2end mechanism with queue management at routers (loss burstiness !) • http://www.fokus.gmd.de/research/cc/glone/products/ voice/apc
Recommend
More recommend