Introduction The Good, the Bad and the • Multimedia conference is a growing area Muffled: the Impact of Different • Well-known that need good quality audio for Degradations on Internet Speech conferencing to be successful • Much research focused on improving delay, Anna Watson and M. Angela Sasse jitter, loss • Many think bandwidth will fix Dept. of CS – But bandwidth has been increasing exponentially University College London, London, UK while quality not! Proceedings of ACM Multimedia November 2000 Example: Missing Words Motivation Throughout • Large field trial from 1998-1999 - 1 hour – 13 UK institutions Meeting – 150 participants - UCL to • Recorded user Perceptual Quality Glasgow • Matched with objective network performance metrics • Suggested that network was not primary influence on PQ! • But loss usually far less than 5%! Problems Cited Outline • Missing Words • Introduction • Experiments – Likely causes: packet loss, poor speech detection, machine glitches • Results • Variation in volume • Conclusions – Likely causes: insufficient volume settings (mixer), poor headset quality • Variation in quality among participants – Likely causes: high background noise, open microphone, poor headset quality • Experiments to measure which affect quality 1
Audioconference Fixed Audioconference Variables Parameters • Robust Audio Tool • Packet loss rates – Home brewed in UCL – 5% (typical) and 20% (upper limit to tolerate) • ‘Bad’ microphone – Limited repair of packet loss • Coded in DVI – Hard to measure, but Altai A087F • 40 ms sample size • Volume differences • Use “repetition” to repair lost packets – Quiet, normal, loud through “pilot studies” • Echo – From open microphone Measurement Method: Measurement Methods: PQ Physiological • Not ITU (see previous paper) • User “cost” • Subjective through “slightly” labeled scale – Fatigue, discomfort, physical strain • Measure user stress – Using a sensor on the finger • Blood Volume Pulse (BVP) – Decreases under stress • “ Fully subscribe that … speech quality should • Heart Rate (HR) not be treated as a unidimensional – Increases under stress (“Fight” or “Flight) phenomenon …” • But … Experimental Material Experimental Conditions • Take script from ‘real’ audioconference • Reference – non-degraded • Act-out by two males without regional accents • 5% loss – both voices, with repetition • Actors on Sun Ultra workstations on a LAN • 20% loss – both voices, with repetition • Echo – one had open mic, not headset – Only audio recorded • Quiet – one recorded low volume, other norm – 16 bit samples – Used RAT • Loud – one recorded high volume, other norm – Used silence deletion (hey, proj1!) • Bad mic – one had low quality mic, other norm • Vary volume and feedback (speakers to mic) • Determined “Intelligibility” not affected by • Split into 2-minute files, 8Khz, 40 ms packets above • Repetition when loss 2
Procedure Subjects • Each listened to seven 2-minute test files twice • 24 subjects – Played with audio tool – 12 men • First file had no degradations (“Perfect”) – 12 women – Users adjusted volume • All had good hearing – Were told it was “best” • Age 18 – 28 • Randomized order of files • None had previous experience in Internet – Except “perfect” was 1 st and 8 th – So, 7 conditions heard once than another order audio or videoconferencing • Baseline physiological readings for 15 min • When done, use 1-100 slider and explain rating (tape-recorded) Quality Under Degradation Outline • Introduction • Experiments • Results • Conclusions • Statistically significant? Statistical Significance Tests Physiological Results: HR • Anova Test – For comparing means of two groups: first hearing and second hearing – No statistical difference between the two groups • Analysis of variance – Degradation effect significant – Reference and 5% loss the same – Reference and Quiet the same – Reference and all others are different – 5% Loss and Quiet the same – 20% Loss and Echo and Loud the same 3
Physiological Statistical Physiological Results: BVP Significance Tests • Bad mic, loud and 20% loss all significantly more stressful than quiet and 5% loss • Echo significantly more stressful than quiet in the HR data only • Contrast to quality! – Mic worse than 20% loss – Least stressful were quiet and 5% loss •Statistically significant? Qualitative Results Qualitative Results of Loss • 5% loss • Asked subjects to describe why each rating – ‘fuzzy’ and ‘buzzy’ (13 of 24 times) • Could clearly identify + From waveform changing in the missing packet and not being in the repeated packet – quiet, loud and echo – ‘robotic’, ‘metallic’, ‘electronic’ (7 times) • Bad mic • 20% loss – ‘distant’, ‘far away’ or ‘muffled’ – ‘robotic’, ‘metallic’, ‘digital’, ‘electronic’ (15 times) – ‘on the telephone’, ‘walkie-talkie’ or ‘in a box’ – ‘broken up’ and ‘cutting out’ (10 times) – ‘fuzzy’ and ‘buzzy’ infrequently (2 times) • 5 said ‘echo’, 10 described major volume changes – Not reliably see the cause of the degradation Discussion Conclusion • 5% loss is different than reference condition • Audio quality degradation not primarily from (despite stats) because of descriptions loss – But subjects cannot identify it well – Volume, mic and echo are worse – Need a tool to identify impairments – And these are easy to fix! Educating users harder. • 20% loss is worse than bad mic based on • By getting descriptions, should be easier to quality, but is the same based on allow users to diagnose problems physiological results – Ex: ‘fuzzy’ or ‘buzzy’ to repetition for repair – need to combine physiological and subjective • Volume changes harder • Methodology of field trials to design controlled – Could be reflected back to the user experiments can help understand media – Could do expert system to make sure certain quality issues quality before being allowed in 4
Future Work Evaluation of Science? • Delay and jitter compared with other • Category of Paper • Space devoted to Experiments? degradations • Interactive environments rather than just • Good Science? listening – 1-10 – Ex: echo probably worse – See if scale meshes with amount of experimental • Combination effects validation – Ex: bad mic plus too loud 5
Recommend
More recommend