identification of voices in disguised speech
play

Identification of voices in disguised speech Jessica Clark* & - PowerPoint PPT Presentation

Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of York ** University of York & JP French Associates pf11@york.ac.uk IAFPA, Gteborg 2006 0.1 outline experiment to test ability of lay


  1. Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of York ** University of York & JP French Associates pf11@york.ac.uk IAFPA, Göteborg 2006

  2. 0.1 outline • experiment to test ability of lay listeners to identify disguised familiar voices • voices have been disguised artificially, as with commercially available voice changers – pitch modified 2

  3. 0.2 structure 1. introduction – rationale for experiment 2. experimental design – speakers – listeners – Control condition – Experimental conditions 3. results 4. discussion & conclusion 3

  4. 1. Introduction 4

  5. 1. Introduction • technical speaker identification is the most frequent task for the forensic phonetician • lay identification is also common in legal cases • many previous studies have thus examined lay listeners’ ability to identify voices and the factors which affect their ability 5

  6. 1.1 previous studies • identification is not automatic or flawless • listeners can make errors even with highly familiar voices – Ladefoged did not recognise his mother from a short sample (Ladefoged & Ladefoged 1980) – flatmates scored only 68% with 10 second samples (Foulkes & Barron 2000) 6

  7. 1.1 previous studies • identification may be affected by [Bull & Clifford 1984] – type of exposure (active/passive) – length of sample – nature of sample (phone, direct, shouting etc) – delay between exposure and test – age of listener – hearing ability – sightedness – natural variability across individual listeners – specific features of voice – degree of familiarity – nature and extent of any disguise 7

  8. 1.2 degree of familiarity • all things equal, more familiar voices are easier to identify • e.g. Hollien, Majewski & Doherty (1982) – listening tests with 10 male voices listener group N % correct (normal condition) familiar 10 98 trained 47 40 unfamiliar 14 27 8

  9. 1.3 disguise • all things equal, disguised voices are harder to identify • e.g. Hollien, Majewski & Doherty (1982) – various forms of disguise used listener group N % correct % correct (normal) (disguised) familiar 10 98 79 trained 47 40 21 unfamiliar 14 27 18 machine approach (LTAS) 30 9

  10. 1.3 disguise • previous studies have examined various types of disguise – whisper, pencils between teeth, hypernasality, dialect change, rate change, professional mimics • but little if any work on voice changers – hardware based – software based – easily available 10

  11. www.crimebusters911.com www.blazeaudio.com www.maplin.co.uk 11

  12. 1.3 disguise • in our study we chose not to use real voice changers, in favour of total control over effects • pitch shift chosen as a universal function 12

  13. 2. Experimental design 13

  14. 2.1 design outline • simple design • listeners asked to identify samples of familiar voices • Control condition unmodified stimuli • 4 Experimental conditions modified stimuli 14

  15. 2.1 design outline • degree of familiarity known to affect rate of successful identification • thus we trained listeners to identify a group of speakers – controls degree of familiarity – all listeners had exactly the same exposure in terms of length & quality of samples – identification task carried out under same conditions 15

  16. 2.2 speakers • 4 male speakers – 16-18 years old • taken from IViE corpus (Grabe, Post & Nolan 2001) – Leeds dialect (nearest to York) – reading text of Cinderella story IViE speaker Experimental name JP Edward JW Matthew MD Harry RP David 16

  17. 2.2 speakers • training materials created for each speaker – c. 90 seconds of Cinderella (302 words) – edited out disfluencies, non-speech sounds, long pauses – samples normalised for amplitude with Audacity 1.2.5 17

  18. 2.3 listeners • 36 listeners • variety of regional/social backgrounds • York residents • age range 19-55 • 10 male, 26 female 18

  19. 2.4 Control condition • all 36 listeners – 4 voices * 90 seconds = c. 6 minutes – presented by PowerPoint 1. training phase with speakers’ names 2. break – Toshiba laptop – Aiwa A170 headphones 3. listening test – individually in quiet room 19

  20. 2.4 Control condition • all 36 listeners 1. training phase 2. break – 10 minutes 3. listening test 20

  21. 2.4 Control condition • all 36 listeners 1. training phase – 8 stimuli 2. break (2 per speaker) 3. listening test – duration c. 10 seconds – 5 second gap between – extracts from other parts of Cinderella story – normalised for amplitude with Audacity 1.2.5 – answer sheet with names

  22. 2.5 Experimental conditions • 4 Experimental conditions • listening tests same format as Control condition • but stimuli modified for pitch • Sound Forge 8.0 – pitch shift effect – accuracy setting ‘high’ – speech 1 mode – preserved durations 22

  23. 2.5 Experimental conditions (i) +8 semitones (ii) +4 semitones (iii) -4 semitones (iv) -8 semitones pitch shift > 8 semitones unnatural and partly incomprehensible 23

  24. 2.5 Experimental conditions listener group N conditions (semitones) A 18 -8, +4 B 18 -4, +8 24

  25. 2.5 Experimental conditions • listening test 16-92 days after Control test – no clear effects for length of delay • same training as in Control condition • 10 minute break • 2 stimuli for familiarisation • 8 experimental stimuli per condition – consecutive runs for + and - stimuli – order reversed for half of each group, but no effect 25

  26. 3. Results 26

  27. 3.1 Control condition • average correct identification = 4.8/8 (60%) 8 7 6 average N correct 5 4 3 2 1 0 Minus 8 Minus 4 Control Plus 4 Plus 8 27

  28. 3.1 Control condition • individuals’ range 8 to 0 • 29/36 performed better than chance control 8 7 6 5 N correct 4 3 2 1 0 listeners 28

  29. 3.2 Experimental conditions • ** sig. lower than in Control (p < .005, Wilcoxon) • trend (n.s.) for higher scores in + conditions 8 7 ** ** ** ** 6 average N correct 5 4 3 2 1 0 29 Minus 8 Minus 4 Control Plus 4 Plus 8

  30. -8 semitones +8 semitones 8 8 7 7 6 6 5 5 N correct N correct 4 4 3 3 2 2 1 1 0 0 listeners listeners -4 semitones +4 semitones 8 8 7 7 6 6 5 5 N correct N correct 4 4 3 3 2 2 1 1 0 0 listeners listeners • variability in listener performance, esp. ±4 • majority perform above chance except -8 30

  31. 3.3 variation by listener sex • women sig. better in Control ( p = .008, Mann-Whitney) – trend (n.s.) maintained in Experimental tests – same pattern reported by Bull & Clifford (1984) 8 Male Female 7 6 ** 5 N correct 4 3 2 1 0 31 Minus 8 Minus 4 Control Plus 4 Plus 8

  32. 3.4 summary • as predicted, identification rates were lower with disguised voices – lowest scores with most extreme form of disguise (±8 semitones) • identification rates slightly better when pitch shifted up than down • trend for women to perform better than men • variability across listeners 32

  33. 4. Discussion & conclusion 33

  34. 4. discussion & conclusion • tests reported here were not forensically realistic • results may be affected by e.g. – degree of familiarity with voice – content of sample (vocabulary, syntax etc) – conditions of exposure (stress etc) – specific form of artificial disguise • software, hardware system • combination of effects 34

  35. 4. discussion & conclusion • considerable variation in listeners’ scores – courts should not assume all witnesses are equally good at such tasks – supports broader principle that lay witnesses should be tested in their ability to identify a voice 35

  36. 4. discussion & conclusion • but even marked disguise was not catastrophic for listeners • a broadly positive conclusion for lay speaker identification – a reasonable chance of identifying familiar voices 36

  37. 4. discussion & conclusion • but a less positive conclusion respect to use of voice changers as a means of protecting vulnerable witnesses giving evidence • more extreme forms of modification may affect intelligibility & naturalness • less extreme forms of modification may render witness’s voice recognisable • different modifications for different voices? 37

  38. 4. discussion & conclusion • as ever… • more work is needed 38

  39. thanks tack thanks to Peter French, Phil Harrison, Robin How 39

Recommend


More recommend