Introduction The Good, the Bad and the • Multimedia conference is a growing area Muffled: the Impact of Different • Well-known that need good quality audio for Degradations on Internet Speech conferencing to be successful • Much research focused on improving delay, jitter, loss Anna Watson and M. Angela Sasse • Many think bandwidth will fix Department of CS – But bandwidth has been increasing exponentially University College London, London, UK while quality not! Proceedings of ACM Multimedia November 2000 Example: Missing Words Motivation Throughout • Large field trial from 1998-1999 – 13 UK institutions - 1 hour Meeting – 150 participants - UCL to • Recorded user Perceptual Quality Glasgow -Super – Beginning, Middle, End Janet – (Why not only at end?) -RTP – (Why not continuously?) reports • Matched with objective network performance 2-5 secs metrics • Suggested that network was not primary influence on PQ! • But loss usually far less than 5%! Problems Cited Outline • Missing words • Introduction • Experiments – Likely causes: packet loss, poor speech detection, machine glitches • Results • Variation in volume • Conclusions – Likely causes: insufficient volume settings (mixer), poor headset quality • Variation in quality among participants – Likely causes: high background noise, open microphone, poor headset quality • Experiments to measure which affect quality 1
Audioconference Fixed Audioconference Variables Parameters • Packet loss rates • Robust Audio Tool (RAT) – 5% (typical of mcast) and 20% (upper limit to – Home brewed in UCL tolerate) – Limited repair of packet loss • ‘Bad’ microphone • Coded in DVI – Hard to measure, but Altai A087F • 40 ms sample size • Volume differences • Use “repetition” to repair lost packets – Quiet, normal, loud through “pilot studies” – Good for small (20ms) – (Why can’t users just adjust volume?) • Echo – Not as good for large (80ms) – From open microphone – (Why?) – (What is this?) Measurement Method: Measurement Method: Perceptual Quality Physiological • Not ITU standard(paper at previous ACM MM) – Text labels bad • User “cost” – Built for Television quality – Fatigue, discomfort, physical strain • Subjective through “slightly” labeled scale • Measure user stress – Using a sensor on the finger • Blood Volume Pulse (BVP) – Decreases under stress • Heart Rate (HR) • “ Fully subscribe that … speech quality should – Increases under stress (“Fight” or “Flight”) not be treated as a unidimensional phenomenon …” • But … Experimental Material Experimental Conditions • Reference – non-degraded • Take script from ‘real’ audioconference • 5% loss – both voices, with repetition • Act-out by two males without regional accents • 20% loss – both voices, with repetition • Actors on Sun Ultra workstations on a LAN • Echo – one had open mic, not headset – Only audio recorded • Quiet – one recorded low volume, other norm – 16 bit samples • Loud – one recorded high volume, other norm – Used RAT • Bad mic – one had low quality mic, other norm – Used silence deletion (hey, project 1!) • Vary volume and feedback (speakers to mic) • Split into 2-minute files, 8Khz, 40 ms packets ! Determined “Intelligibility” not affected by • Repetition when loss above 2
Procedure Subjects • Each listened to seven 2-minute test files twice • 24 subjects – Played with audio tool – 12 men • First file had no degradations (“Perfect”) – 12 women – Users adjusted volume • All had good hearing – Were told it was “best” • Age 18 – 28 • Randomized order of files • None had previous experience in Internet – Except “perfect” was 1 st and 8 th audio or videoconferencing – So, 7 conditions heard once than another order • Baseline physiological readings for 15 min • When done, use 1-100 slider and explain rating (tape-recorded) Quality Under Degradation Outline • Introduction • Experiments • Results • Conclusions • Statistically significant? Statistical Significance Tests Physiological Results: HR • Anova Test – For comparing means of two groups: first hearing and second hearing – No statistical difference between the two groups • Analysis of variance – Degradation effect significant + Reference and all others are different – Reference and 5% loss the same – Reference and Quiet the same – 5% Loss and Quiet the same – 20% Loss and Echo and Loud the same 3
Physiological Statistical Physiological Results: BVP Significance Tests • Bad mic , loud and 20% loss all significantly more stressful than quiet and 5% loss • Echo significantly more stressful than quiet in the HR data only • Contrast to quality! – Bad Mic worse than 20% loss – Least stressful were quiet and 5% loss •Statistically significant? Qualitative Results Qualitative Results of Loss • 5% loss • Asked subjects to describe why each rating – ‘fuzzy’ and ‘buzzy’ (13 of 24 times) • Could clearly identify + From waveform changing in the missing packet and not being in the repeated packet – quiet , loud and echo – ‘robotic’, ‘metallic’, ‘electronic’ (7 times) • Bad mic • 20% loss – ‘distant’, ‘far away’ or ‘muffled’ – ‘robotic’, ‘metallic’, ‘digital’, ‘electronic’ (15 times) – ‘on the telephone’, ‘walkie-talkie’ or ‘in a box’ – ‘broken up’ and ‘cutting out’ (10 times) – ‘fuzzy’ and ‘buzzy’ infrequently (2 times) • 5 said ‘ echo’ , 10 described major volume changes – Not reliably see the cause of the degradation Discussion Conclusion • 5% loss is different than reference condition • Audio quality degradation not primarily from (despite stats) because of descriptions loss – But subjects cannot identify it well – Volume, mic and echo are worse – Need a tool to identify impairments – And these are easy to fix! Educating users harder. • 20% loss is worse than bad mic based on • By getting descriptions, should be easier to quality, but is the same based on allow users to diagnose problems physiological results – Ex: ‘fuzzy’ or ‘buzzy’ to repetition for repair – need to combine physiological and subjective • Volume changes harder • Methodology of field trials to design controlled – Could be reflected back to the user experiments can help understand media – Could do expert system to make sure certain quality issues quality before being allowed in 4
Future Work • Delay and jitter compared with other degradations • Interactive environments rather than just listening – Ex: echo probably worse • Combination effects – Ex: bad mic plus too loud 5
Recommend
More recommend