Identification of voices in disguised speech Jessica Clark* & - PowerPoint PPT Presentation

Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of York ** University of York & JP French Associates pf11@york.ac.uk IAFPA, Göteborg 2006

0.1 outline • experiment to test ability of lay listeners to identify disguised familiar voices • voices have been disguised artificially, as with commercially available voice changers – pitch modified 2

0.2 structure 1. introduction – rationale for experiment 2. experimental design – speakers – listeners – Control condition – Experimental conditions 3. results 4. discussion & conclusion 3

1. Introduction 4

1. Introduction • technical speaker identification is the most frequent task for the forensic phonetician • lay identification is also common in legal cases • many previous studies have thus examined lay listeners’ ability to identify voices and the factors which affect their ability 5

1.1 previous studies • identification is not automatic or flawless • listeners can make errors even with highly familiar voices – Ladefoged did not recognise his mother from a short sample (Ladefoged & Ladefoged 1980) – flatmates scored only 68% with 10 second samples (Foulkes & Barron 2000) 6

1.1 previous studies • identification may be affected by [Bull & Clifford 1984] – type of exposure (active/passive) – length of sample – nature of sample (phone, direct, shouting etc) – delay between exposure and test – age of listener – hearing ability – sightedness – natural variability across individual listeners – specific features of voice – degree of familiarity – nature and extent of any disguise 7

1.2 degree of familiarity • all things equal, more familiar voices are easier to identify • e.g. Hollien, Majewski & Doherty (1982) – listening tests with 10 male voices listener group N % correct (normal condition) familiar 10 98 trained 47 40 unfamiliar 14 27 8

1.3 disguise • all things equal, disguised voices are harder to identify • e.g. Hollien, Majewski & Doherty (1982) – various forms of disguise used listener group N % correct % correct (normal) (disguised) familiar 10 98 79 trained 47 40 21 unfamiliar 14 27 18 machine approach (LTAS) 30 9

1.3 disguise • previous studies have examined various types of disguise – whisper, pencils between teeth, hypernasality, dialect change, rate change, professional mimics • but little if any work on voice changers – hardware based – software based – easily available 10

www.crimebusters911.com www.blazeaudio.com www.maplin.co.uk 11

1.3 disguise • in our study we chose not to use real voice changers, in favour of total control over effects • pitch shift chosen as a universal function 12

2. Experimental design 13

2.1 design outline • simple design • listeners asked to identify samples of familiar voices • Control condition unmodified stimuli • 4 Experimental conditions modified stimuli 14

2.1 design outline • degree of familiarity known to affect rate of successful identification • thus we trained listeners to identify a group of speakers – controls degree of familiarity – all listeners had exactly the same exposure in terms of length & quality of samples – identification task carried out under same conditions 15

2.2 speakers • 4 male speakers – 16-18 years old • taken from IViE corpus (Grabe, Post & Nolan 2001) – Leeds dialect (nearest to York) – reading text of Cinderella story IViE speaker Experimental name JP Edward JW Matthew MD Harry RP David 16

2.2 speakers • training materials created for each speaker – c. 90 seconds of Cinderella (302 words) – edited out disfluencies, non-speech sounds, long pauses – samples normalised for amplitude with Audacity 1.2.5 17

2.3 listeners • 36 listeners • variety of regional/social backgrounds • York residents • age range 19-55 • 10 male, 26 female 18

2.4 Control condition • all 36 listeners – 4 voices * 90 seconds = c. 6 minutes – presented by PowerPoint 1. training phase with speakers’ names 2. break – Toshiba laptop – Aiwa A170 headphones 3. listening test – individually in quiet room 19

2.4 Control condition • all 36 listeners 1. training phase 2. break – 10 minutes 3. listening test 20

2.4 Control condition • all 36 listeners 1. training phase – 8 stimuli 2. break (2 per speaker) 3. listening test – duration c. 10 seconds – 5 second gap between – extracts from other parts of Cinderella story – normalised for amplitude with Audacity 1.2.5 – answer sheet with names

2.5 Experimental conditions • 4 Experimental conditions • listening tests same format as Control condition • but stimuli modified for pitch • Sound Forge 8.0 – pitch shift effect – accuracy setting ‘high’ – speech 1 mode – preserved durations 22

2.5 Experimental conditions (i) +8 semitones (ii) +4 semitones (iii) -4 semitones (iv) -8 semitones pitch shift > 8 semitones unnatural and partly incomprehensible 23

2.5 Experimental conditions listener group N conditions (semitones) A 18 -8, +4 B 18 -4, +8 24

2.5 Experimental conditions • listening test 16-92 days after Control test – no clear effects for length of delay • same training as in Control condition • 10 minute break • 2 stimuli for familiarisation • 8 experimental stimuli per condition – consecutive runs for + and - stimuli – order reversed for half of each group, but no effect 25

3. Results 26

3.1 Control condition • average correct identification = 4.8/8 (60%) 8 7 6 average N correct 5 4 3 2 1 0 Minus 8 Minus 4 Control Plus 4 Plus 8 27

3.1 Control condition • individuals’ range 8 to 0 • 29/36 performed better than chance control 8 7 6 5 N correct 4 3 2 1 0 listeners 28

3.2 Experimental conditions • ** sig. lower than in Control (p < .005, Wilcoxon) • trend (n.s.) for higher scores in + conditions 8 7 ** ** ** ** 6 average N correct 5 4 3 2 1 0 29 Minus 8 Minus 4 Control Plus 4 Plus 8

-8 semitones +8 semitones 8 8 7 7 6 6 5 5 N correct N correct 4 4 3 3 2 2 1 1 0 0 listeners listeners -4 semitones +4 semitones 8 8 7 7 6 6 5 5 N correct N correct 4 4 3 3 2 2 1 1 0 0 listeners listeners • variability in listener performance, esp. ±4 • majority perform above chance except -8 30

3.3 variation by listener sex • women sig. better in Control ( p = .008, Mann-Whitney) – trend (n.s.) maintained in Experimental tests – same pattern reported by Bull & Clifford (1984) 8 Male Female 7 6 ** 5 N correct 4 3 2 1 0 31 Minus 8 Minus 4 Control Plus 4 Plus 8

3.4 summary • as predicted, identification rates were lower with disguised voices – lowest scores with most extreme form of disguise (±8 semitones) • identification rates slightly better when pitch shifted up than down • trend for women to perform better than men • variability across listeners 32

4. Discussion & conclusion 33

4. discussion & conclusion • tests reported here were not forensically realistic • results may be affected by e.g. – degree of familiarity with voice – content of sample (vocabulary, syntax etc) – conditions of exposure (stress etc) – specific form of artificial disguise • software, hardware system • combination of effects 34

4. discussion & conclusion • considerable variation in listeners’ scores – courts should not assume all witnesses are equally good at such tasks – supports broader principle that lay witnesses should be tested in their ability to identify a voice 35

4. discussion & conclusion • but even marked disguise was not catastrophic for listeners • a broadly positive conclusion for lay speaker identification – a reasonable chance of identifying familiar voices 36

4. discussion & conclusion • but a less positive conclusion respect to use of voice changers as a means of protecting vulnerable witnesses giving evidence • more extreme forms of modification may affect intelligibility & naturalness • less extreme forms of modification may render witness’s voice recognisable • different modifications for different voices? 37

4. discussion & conclusion • as ever… • more work is needed 38

thanks tack thanks to Peter French, Phil Harrison, Robin How 39

Identification of voices in disguised speech Jessica Clark* & - PowerPoint PPT Presentation

Identification of voices in disguised speech Jessica Clark* & Paul Foulkes** * University of York ** University of York & JP French Associates pf11@york.ac.uk IAFPA, Gteborg 2006 0.1 outline experiment to test ability of lay

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Worcestershire Voices Worcestershire Voices Civil Society after Civil Society after

Voices for Public Transportation powered by powered by Voices for Public Transportation Voices

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 15-492/18-492 Speech Synthesis Building Voices Building a Voice Designing the

Loud Voices in the China Field Loud Voices in the China Field A recent debate in Eurasian

OUR VOICES 2017 www.facebook.com/OurVoices2017/ @OurVoices2017 Our Voices 2017 is a

Cross-Cultural Voices: Essays on Acclimating to the U.S. Many Voices One College Faculty

Ensuring All Voices are Heard Nailah Pope Harden Outreach Specialist/ EJ Consultant Voice to

Families at the Swiss Embassy Rick Ward Embassy Tax Services LLC October 26, 2016 Embassy

CFD Topological Optimization of a Car Water-Pump Inlet using TOSCA Fluid and STAR- CCM+ Dr.

INTERNATIONAL DIALOGUE ON MIGRATION (IDM) 2014 Inter- sessional Workshop, 78 October 2014

Analysis of Everyday Sounds Dan Ellis and Keansub Lee Laboratory for Recognition and

Community Voices In Research Using Research To Improve Health in Springfield Sponsored by

CSOC Service Guidelines Clinical Criteria Prevent/reduce the need for care in a more

Tyne and Wear Freight Partnership Quarterly M eeting 11 th M arch 2014 Agenda Introductions

Load Security Pilot Nina Day Mark Horton Why is load security an issue? On the road,

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us