long term formant long term formant distribution as a
play

Long-Term Formant Long-Term Formant Distribution as a forensic- - PowerPoint PPT Presentation

Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic feature ASA 2 d P ASA 2nd Pan-American/Iberian A i /Ib i Meeting on Acoustics Cancn, Mxico, Nov 15-19, 2010 2010 Michael Jessen and Timo


  1. Long-Term Formant Long-Term Formant Distribution as a forensic- phonetic feature phonetic feature ASA 2 d P ASA 2nd Pan-American/Iberian A i /Ib i Meeting on Acoustics Cancún, México, Nov 15-19, 2010 2010 Michael Jessen and Timo Becker Michael Jessen and Timo Becker BKA, Department of Speaker Identification and Audio Analysis (KT54) 3aSC4 Special Session on Forensic Voice Comparison and Forensic Acoustics @ 2nd Pan-American/Iberian Meeting on Acoustics, Cancún, México, 15–19 November, 2010 http://cancun2010.forensic-voice-comparison.net

  2. Structure Structure 1. Long-Term Formant Distribution: measurement methods and background g 2. LTF and body height 3. 3 LTF LTF measurement consistency t i t 4. Language dependence of LTF 5. Recognition performance based on LTF and automatic speaker recognition 6. Conclusions 2 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  3. Long-Term Formant (LTF) Distribution: t terminology i l Long Te m Fo mant Dist ib tion (Nolan & G igo as 2005) Long-Term Formant Distribution (Nolan & Grigoras, 2005) is a global (as opposed to segment-based) representation of vowel formant frequencies over an entire recording of a speaker (or over a long stretch of speech from that speaker). Formant frequencies are extracted with a formant tracker (LPC-based) and manually corrected. No segmentation into sounds is performed. into sounds is performed. The resulting distribution of formant values (mainly F2 and F3) can be characterized in different ways The and F3) can be characterized in different ways. The simplest way is to calculate the average. More advanced ways include modeling of the LTF distribution with Gaussian Mixture Models (GMM) (Becker et al Gaussian Mixture Models (GMM) (Becker et al., 2008). 2008) 3 3 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  4. Speech-Datei Ungeschnitten geschnitten und Excel-Ausschnitt Illustration of the method: Illustration of the method: Step 1: Editing the signal in a way that only vowels with clear formant only vowels with clear formant structure remain Step 2: LPC-analysis and manual correction of the formant tracks Workshop LTF - BKA 2010 - M.Jessen 4 4 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  5. Step 3: Exporting the formant tracks F1,2,3 for further processing F1 of limited reliability in telephone speech; F4 unreliable or invisible 4000 3500 3500 3000 2500 F1 2000 F2 F3 1500 1000 500 0 Formant values 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101 105 109 113 117 121 125 129 every 10 ms 5 5 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  6. Example of the raw LTF di t ib ti distribution of a speaker f k from freeware Catalina Forensic Expert opinion v1.0 from Catalin Grigoras (U Colorado Denver) http://www forensicav ro/download/CatalinaManual3h pdf http://www.forensicav.ro/download/CatalinaManual3h.pdf 6 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  7. Correlation between LTF and body height h i ht 1800 Pearson's product-moment correlation F2 F2 One-sided (less) 1700 rho=-0.315726857072528 1600 p=0.00204454743894922 F2 [Hz] 1500 1400 1400 LTF 1300 1200 1100 1100 150 155 160 165 170 175 180 185 190 195 200 205 Body height [cm] rho=-0.339139631480740 2800 F3 F3 p=0.00097693931875183 p 0 00097693931875183 2700 2600 Significant negative 2500 F3 [Hz] correlations between long- 2400 LTF term formant frequencies 2300 (F2, F3) and body height 2200 2100 LTF-means from 81 speakers in LTF means from 81 speakers in 2000 2000 Pool 2010 (telephone-transmitted) 150 155 160 165 170 175 180 185 190 195 200 205 (thanks to Hanna Feiser for Body height [cm] assistance) 7 7 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  8. Measurements consistency across phoneticians: LT-F2 h ti i LT F2 1800 F2 1700 1600 1600 1500 JF 2 [Hz] AK 1400 1400 Bay Bay LT-F2 B1 B2 1300 1200 1200 1100 1000 1000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 recordings of different speakers LTF-means from 20 speakers in “Digs” Pearson correlations (two-sided) between 0.84 and 0.95 dialect corpus under forensically realistic conditions 8 8 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  9. Measurements consistency across phoneticians: LT-F3 h ti i LT F3 2800 F3 F3 2700 2600 2500 JF 3 [Hz] AK 2400 2400 Bay Bay LT-F B1 B2 2300 2200 2200 2100 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 recordings of different speakers Pearson correlations (two-sided) between 0.98 and 0.99 9 9 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  10. Language influence on LTF Language influence on LTF 3000 Russian 2900 German probe1 For these data, German probe2 different languages 2800 German probe3 Albanian do not differ in the 2700 LTF-space that [Hz] 2600 th they occupy LT ‐ F3 [ (one-way ANOVA 2500 [F(4,55) = 0.44; p= 2400 0.77]). 2300 2200 2100 2100 LTF-means from three German 2000 speakers in Digs dialect corpus and from Russian and Albanian 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 speakers in case data under p LT ‐ F2 [Hz] analogous conditions (spont telephone) 10 10 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  11. Speaker recognition tests Speaker recognition tests 37 target trials and 803 non-target trials, involving 21 speakers g g , g p from casework, comparing: - Baseline = a standard GMM-UBM automatic system - FGMM = GMM-modeled LTF 11 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  12. New development at BKA: Target trials (same speaker) DiSC-Plot Non-target trials (different speakers) Discrimination, Scatter, Correlation bution mant Distrib g-Term Form Long logLR (lnLR) Automatic speaker recognition system 12 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  13. Conclusions: LTF analysis in forensic phonetics and acoustics (1) h ti d ti (1) LTF (F2 and F3) correlates negatively with body height ☺ (relevant for voice profiling). LTF measurements have high consistency across phonetic ☺ experts. Pending further tests and with some degree of caution, LTF f f ☺ ☺ statistics established for one language can be used across languages. LTF (F2 and F3) do not differ much between different vocal ☺ effort levels. Vocal effort differences are a common problem i in forensic material. f i t i l 13 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  14. Conclusions: LTF analysis in forensic phonetics and acoustics (2) h ti d ti (2) Performance of LTF analysis with classical evaluation measures Performance of LTF analysis with classical evaluation measures � � (DET-plots, APE-plots, C llr ) is worse than performance of automatic speaker recognition and fusion does not increase overall performance. But: p The tests so far are based predominantly on matching conditions; � under mismatched conditions, the relative performance of LTF analysis might increase. analysis might increase. Detailed results in the DiSC plot shows that LTF and automatic ☺ speaker recognition can make different errors: using both methods is a good safeguard against false conclusions. methods is a good safeguard against false conclusions. Quite limited LR values in same-speaker comparisons (max about � LR=16 in case material for the tests so far): LTF cannot give very strong support for same-speaker hypothesis strong support for same speaker hypothesis. Different-speaker comparisons can yield very low LR values: LTF ☺ can give very strong support for different-speaker hypothesis. 14 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

  15. References Becker, Timo, Michael Jessen and Catalin Grigoras (2008): Forensic speaker verification using formant features and Gaussian mixture models. Proceedings of Interspeech 2008, 1505-1508. Kirchhübel Christin (2009): The effects of Lombard speech on vowel formant measurements MSc thesis Kirchhübel, Christin (2009): The effects of Lombard speech on vowel formant measurements. MSc thesis, University of York, UK. Moos, Anja (2008): Forensische Sprechererkennung mit der Messmethode LTF (long-term formant distribution) MA thesis Universität des Saarlandes distribution). MA thesis, Universität des Saarlandes. www.psy.gla.ac.uk/docs/download.php?type=PUBLS&id=1286. Moos, Anja (2010): Long-term formant distribution as a measure of speaker characteristics in read and spontaneous speech To appear in The Phonetician spontaneous speech. To appear in The Phonetician . Nolan, Francis and Catalin Grigoras (2005): A Case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law 12: 143-173. Wagner, Katrin (2010): Der Einfluss der Sprechlautstärke auf die ersten drei Vokalformanten in mobilfunkübertragener Sprache: Forensischer Stimmenvergleich anhand der LTF-Methode“. BA thesis, Universität Frankfurt. Workshop LTF - BKA 2010 - M.Jessen 15 15 Nov 17, 2010 Long-Term Formant (LTF) Distribution as a forensic-phonetic feature

Recommend


More recommend