Investigating the forensic applications of global and local temporal representations of speech for dialect discrimination Leah Bradshaw, Vincent Hughes, and Eleanor Chodroff Department of Language and Linguistic Science University of York
Forensi sic phonetics INTRODUCTION Voice analysis Voice comparison
Speake ker classi ssification Process of determining speaker-specific features INTRODUCTION (e.g., gender, age, dialect, idiosyncratic speech markers, etc.) using: • Auditory analysis • Acoustic-phonetic analysis • Automatic speaker recognition approaches
Acoustic-phonetic analysis frequently involves court-presentable measurements that are strongly INTRODUCTION focused on segmental information: • Formants • F0 • Voice onset time But what about suprasegmental information, and specifically information about a speaker’s rhythmic pattern?
Rhyt ythm in sp speake ker classi ssification Previous studies demonstrate some utility of INTRODUCTION rhythm for dialect discrimination and forensic purposes Limited in its application in research and casework Ferragne and Pellegrino 2004, Biadsy and Hirschberg 2009, Torgersen and Szakay 2012, Leemann et al. 2012, 2015, Dellwo et al. 2015
Rhyt ythm depends s on so some temporal represe sentation of sp speech REPRESENTING TIME IN SPEECH Rhythm: Temporal characteristics of a spoken utterance How can temporal characteristics of a spoken utterance be represented in an acoustic-phonetic analysis? In an ASR analysis?
Global temporal represe sentations REPRESENTING Long-term alternations in vocalic and consonantal TIME IN SPEECH intervals which may approximate the rhythmic pattern of speech Rhyt ythm Metrics: s: measures examining the degree of variability in the duration of pre-specified intervals (e.g., vowels, consonants, CV sequences, adjacent intervals, etc.) Rasmus et al., 1990, Grabe and Low 2002, Dellwo 2006
Rhyt ythm in sp speake ker classi ssification Syllable vs stress-timed distinctions REPRESENTING TIME IN SPEECH Syl yllable-ti timed: d: equal syllable durations Stress ss-ti timed: d: equal stressed syllable durations (more variability between stressed and unstressed syllables) Problematic: too coarse – but, possibly a place to start Pike 1945, Abercrombie 1967, Dauer 1983, Arvaniti 2009
Local temporal represe sentations REPRESENTING De Delt lta ( Δ ) ) and delta-delta ( ΔΔ ) features: s: Reflect the TIME IN SPEECH change in spectral properties between adjacent temporal frames and the acceleration of that change Common in ASR systems e.g., Lee et al. 1990, Matsui and Furui 1990, Gish and Schmidt 1994
1) Analyze rhythmic profile of four varieties of British English: Cambridge, Multicultural London English, GOALS Leicester, and Punjabi-Leicester 2) Investigate the utility of global RMs for discriminating among the dialects 3) Compare global and local temporal representations for dialect discrimination
Introduction OUTLINE Corpus Description Global: Rhythm Metrics Local: Deltas and Delta-deltas Discussion
Four British sh English sh Dialects “So “South” Leicest ster (“Midlands” s”) METHODS Non-contact No contact Cambridge English (CE) Leicester English (LE) (A (Angl glo) Punjabi-Leicester Multicultural London Contact Co English (PLE) English (MLE) At least one parent as (Ethnic) (Et Caribbean descent native Punjabi speaker International Varieties s of English sh (IV IViE iE) corpus: s: 12 CE, 12 MLE, age 16 Wo Worma rmald (2 (2016): ): 8 LE, 22 PLE, ages 20–53
Introduction OUTLINE Corpus Description Global: Rhythm Metrics Local: Deltas and Delta-deltas Discussion
stdevV st vV Standard deviation of vocalic interval duration st stdevC vC Standard deviation of consonantal interval GLOBAL MEASURES: duration Rhythm Metrics Va Varc rcoV Coefficient of variation for the vocalic interval duration
nP nPVI-V Pairwise Variability Index for vocalic interval durations nP nPVI-C Pairwise Variability Index for consonant GLOBAL MEASURES: interval durations nP nPVI-CV CV Normalised pairwise variability index for Rhythm Metrics summed consonantal and vocalic interval durations
METHODS Cambridge, MLE: Praat EasyAlign for British English Leicester varieties: Alignments accompanied the recordings All phone alignments were manually adjusted Consonantal and vowel intervals determined based on the phone alignments RMs measured with the Duration Analyzer Praat script Dellwo 2019
stdevV stdevC VarcoV 2 RESULTS: 0 value (z − scored) − 2 Rhythm Metrics nPVI_V nPVI_C nPVI_CV 2 0 − 2 CE MLE LE PLE CE MLE LE PLE CE MLE LE PLE Dialect Dialect significantly improved model fit No gender differences
2 stdev − V (z − scored) 1 0 Cambridge English sh : higher − 1 stdev-V, VarcoV, nPVI-CV − 2 RESULTS: CE MLE LE PLE Dialect MLE : average ML 2 Varco − V (z − scored) Rhythm Metrics 1 Leicest ster English sh : higher 0 VarcoV − 1 − 2 Punjabi Leicest ster : lower CE MLE LE PLE Dialect stdev-V, VarcoV 3 2 nPVI − CV (z − scored) All relative to the average production 1 across all four dialects 0 − 1 − 2 CE MLE LE PLE Dialect
3 stdev − C (z − scored) 2 1 Cambridge English sh : lower 0 stdev-C − 1 RESULTS: CE MLE LE PLE Dialect MLE : lower ML 3 stdev-C, nPVI-V, nPVI-C 2 nPVI − V (z − scored) Rhythm Metrics 1 Leicest ster English sh : higher 0 stdev-C, nPVI-V, nPVI-CV − 1 − 2 CE MLE LE PLE Dialect Punjabi Pu bi-Leicest ster : higher stdev-C 3 nPVI − C (z − scored) 2 lower nPVI-V 1 0 All relative to the average production − 1 across all four dialects CE MLE LE PLE Dialect
Cluster plot MLE 50 2 CE 42 CE 37 MLE 48 CE 32 MLE 54 MLE 51 CE 35 MLE 46 MLE 53 CE 41 CE 40 MLE 43 CE 33 CE 31 MLE 52 MLE 44 CE 38 CE 39 CE 34 LE 15 MLE 45 PLE 28 CE 36 MLE 47 RESULTS: PLE 30 ~Midlands vs. South MLE 49 0 PLE 20 PLE 24 Dim2 (33.7%) LE 14 LE 22 PLE 27 cluster ● PLE 26 PLE 21 PLE 18 Rhythm Metrics ● ● PLE 29 PLE 16 LE 23 ● 1 ● PLE 12 a ● ● PLE 6 PLE 19 PLE 4 PLE 17 2 a PLE 9 ● 3 a PLE 5 LE 1 ● LE 2 PLE 25 ● 4 PLE 13 a − 2 ● PLE 3 LE 7 PLE 10 ● PLE 11 − 4 ● LE 8 − 5.0 − 2.5 0.0 2.5 Dim1 (54.1%) Purity: 0.64 ~Anglo vs Contact
Introduction OUTLINE Corpus Description Global: Rhythm Metrics Local: Deltas and Delta-deltas Discussion
MFCCs s Voice activity automatically METHODS: detected + manually corrected 20 ms frames, shifted by 10 ms Δ and ΔΔ ΔΔ s 0–4000 Hz CMVN applied for room/equipment normalization
Δ s and and ΔΔ s Deltas: change between METHODS: MFCCs in adjacent frames Delta-deltas: change between Δ and ΔΔ ΔΔ s deltas in adjacent vectors Averaged for each recording MFCCs not included in the analysis
Cluster plot PLE 18 ● PLE 5 2.5 RESULTS: ● CE 36 CE 32 MLE 51 MLE 52 PLE 26 LE 14 MLE 44 MLE 45 ● PLE 30 MLE 43 PLE 20 ● ● Dim2 (10.9%) LE 15 LE 8 LE 1 CE 39 CE 40 MLE 47 cluster Deltas and PLE 3 LE 2 CE 42 PLE 17 CE 31 MLE 48 CE 35 PLE 13 MLE 50 1 0.0 ● a ● PLE 24 CE 33 delta-deltas PLE 16 PLE 9 CE 38 MLE 53 2 LE 7 a MLE 49 ● PLE 11 PLE 27 3 MLE 46 PLE 19 a CE 34 PLE 29 MLE 54 4 CE 37 a ● PLE 12 LE 22 PLE 4 PLE 10 PLE 25 PLE 6 − 2.5 PLE 21 PLE 28 CE 41 LE 23 − 5.0 − 4 − 2 0 2 Dim1 (12.1%) Purity: 0.44
Significant differences in RMs among four British English dialects DISCUSSION CE and LE more stress-timed––but in different ways MLE and PLE more syllable-timed––but in different ways Combination of RMs can be used as a Rhyt ythmic Profile
Rhyt ythmic profile is a useful feature in dialect discrimination DISCUSSION Issu ssue: RMs somewhat correlated Future directions: s: Which RMs and combinations of RMs are indeed best and least redundant? Examine whether these results hold for dialects collected in a single corpus
Pr Proof of concept pt: Global temporal representations > local temporal representations for dialect discrimination DISCUSSION Demonstrates need for global temporal representation in automatic speaker and language recognition systems (some work done already) Forensi sic application of RMs: s: directly interpretable, court presentable Adami et al. 2003, Shriberg et al. 2005, Dehak et al. 2007
Thanks to: Thank you! Jess Wormald Paul Foulkes Peter French Sam Hellmuth
xxx xxx
Recommend
More recommend