effect of telephone line transmission and digital audio
play

Effect of Telephone-Line Transmission and Digital Audio Format on - PowerPoint PPT Presentation

1 Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011 Effect of Telephone-Line Transmission and Digital Audio Format on Formant Tracking Measurements Christoph Meinerz Herbert Masthoff Landeskriminalamt Brandenburg,


  1. 1 Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011 Effect of Telephone-Line Transmission and Digital Audio Format on Formant Tracking Measurements Christoph Meinerz Herbert Masthoff Landeskriminalamt Brandenburg, Germany Department of Phonetics, University of Trier, Germany christoph.meinerz@gmx.de masthoff@uni-trier.de

  2. 2 Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011 Introduction - Formants, Speaker ID and Audio Compression Method - Experimental Setup, Hardware, Software Results - Formant Shift Conclusion - What to do Christoph Meinerz Herbert Masthoff Landeskriminalamt Brandenburg, Germany Department of Phonetics, University of Trier, Germany christoph.meinerz@gmx.de masthoff@uni-trier.de

  3. Introduction 3 27.07.2011 • (revival of) reports of formant measurements for speaker identification (i.e. Nolan/Grigoras, 2005; Becker et al., 2007; Jessen et al., 2010; Simpson/French, 2010) • reports of effects of telephone and lossy compression on acoustic parameters (Künzel, 2001; Köster/Grasmück, 2004; Gonzalez et. al., 2003) • the problem is real: telephone-intercepts in low-Bit .mp3! ➡ results of preliminary study: effects of telephone-line and lossy low- Bit audio compression on LPC-based formant-measurement and no intra-speaker variation

  4. Method I 4 27.07.2011 1 2 1 2 Experimental set-up - „The Plan“

  5. Method II 5 27.07.2011 mike .wav PCM 44.1 kHz 705 kbps Tech-Specs: Sound Studio UoT mike .wma CBR 22. kHz 20 kbps Mike: Neumann M147 Tube Soundcard: RME Hammerfall mike .mp3 CBR 8 kHz 8 kbps tel .wav PCM 44.1 kHz 705 kbps Tech-Specs: tel .wma CBR 22. kHz 20 kbps „Re-Tel“ - Tel. Rec. Adapter 157 Soundcard: MBox 2 Pro tel .mp3 CBR 8 kHz 8 kbps Audio Formats and Hardware

  6. Results I 6 27.07.2011 2.400 2.400 1.800 1.800 1.200 1.200 600 600 0 0 mike .wav mike .wma mike .mp3 tel .wav tel. wma tel .mp3 F3 F3 Shift of average formant frequency according to format 1 2 F2 F2 (males) F1 F1

  7. Results II 7 27.07.2011 2.400 2.400 1.800 1.800 1.200 1.200 600 600 0 0 mike .wav mike .wma mike .mp3 tel .wav tel. wma tel .mp3 F3 F3 Shift of average formant frequency according to format 1 2 F2 F2 (females) F1 F1

  8. Results III 8 27.07.2011 2.400 2.400 100 % 98 % 1.800 1.800 82 % 83 % 80 % 77 % 100 % 98 % 90 % 1.200 1.200 87 % 83 % 82 % 600 600 104 % 100 % 98 % 102 % 83 % 98 % 0 0 mike .wav mike .wma mike .mp3 tel .wav tel. wma tel .mp3 F3 F2 Mean shift of average formant frequency according to format % (all) F1

  9. Results IV 9 27.07.2011 2 Sonagraphic symptoms (top mike .wav, bottom mike .mp3)

  10. Results V 10 27.07.2011 2 Sonagraphic symptoms (top tel .wav, bottom tel .mp3)

  11. Results VI 11 27.07.2011 Sonagraphic symptoms (top mike .wav, bottom mike .mp3) 1

  12. Results VII 12 27.07.2011 Sonagraphic symptoms (top tel .wav, bottom tel .mp3) 1

  13. Summary 13 27.07.2011 • shift of formant frequencies (all) • F3: downward ≈ 2 - 23 % • F2: downward ≈ 1 - 17 % • F1: mike downward ≈ 1 - 16 % tel upward ≈ 2 - 4 %, .wav + .wma tel downward ≈ 1 %, .mp3 • highest amount of shift in tel .mp3, 8 kbps • telephone-line alone produces shift of F2, F3 ≈ mike .mp3 • sonagraphic and auditory symptoms • spectral cancellations - „the moth“ • „musical noise“ effect

  14. Conclusion 14 27.07.2011 • results confirm those already reported (i.e. Becker et al., 2011!) • consider shifting effects when doing formants and formant-related ASR (LPC) • include larger population for statistical significance - possibly detect “critical” Bit-rate • possibly cross-check with FFT -based measurements

  15. Moth-Zilla (Becker et. al., Vienna 2011) 15 27.07.2011 Thank you for your attention!

  16. References 16 27.07.2011 Becker, T. et al: Forensic speaker verification using formant features and Gaussian Mixture Models. Interspeech 2008 Special Session: Forensic Speaker Recognition – Traditional and Automatic Approaches, Brisbane. Boersma, P./D. Weenink: Praat: doing phonetics by computer [Computer program]. Version 5.2.17, retrieved 26 March 2011 from http://www.praat.org/ Gonzalez, J. et al.: Acoustic analysis of pathological voices compressed with MPEG System. Journal of Voice, 17, 2003, 126-139. Grasmück, C./J.-P. Köster: Die Auswirkung von mp3 und ATRAC-Kompression auf sprechertypische Parameter des Sprachsignals. In: Nolte, B.: Proceedings „Schall und Schwingungen in sensibler Umgebung“, 2004, Bonn, 126-132. Harrison, P.: Formant measurement errors for multiple synthetic speakers. IAFPA Annual Conference 2010, Trier. Jessen, M. et al.: Correlation between long-term formant measurements and automatic speaker recognition in forensic case material. IAFPA Annual Conference 2010, Trier. Künzel, H.J.: Beware of the telephone effect: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguistics, 8, 2001, 80-99. Nolan, F./C. Grigoras: A case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law, 12, 2005, 143-173. Simpson, S./P. French : Testing the speaker discrimination ability of formant measurements. IAFPA Annual Conference 2010, Trier.

Recommend


More recommend