Holistic perception of voice quality matters more than L1 when - PowerPoint PPT Presentation

Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli Eugenia San Segundo, Paul Foulkes & Vincent Hughes University of York SST 2016 Parramatta, Australia 7-9 December

1. Introduction - Voice quality (VQ) Quasi-permanent quality resulting from a combination of long-term laryngeal and supralaryngeal features (Laver, 1980) – broad definition APPROACH IDIOSYNCRATIC Articulatory/ Perceptual/ Acoustic FORENSIC PHONETICS! naïve listeners experts • forensic speaker comparison - holistic -featural • earwitness evidence  Differences in speaker similarity ratings by native vs non-native listeners? 2

2. Hypothesis  naïve listeners will rely on holistic VQ perception in order to judge similarity between speakers... …. regardless of their L1 i.e. no native language advantage (cf. Perrachione et al. 2009)  when? under controlled conditions of speaker similarity  what? short speech samples  why? VQ = only resource available for listeners to judge speaker similarity

3. Materials and method 3.1. Subjects 5 pairs male MZ twins: – native Spanish (Madrid) – no voice pathologies – similar sounding: 1. similar age mean 21, sd 3.7 2. similar mean F0 mean 113 Hz, sd 13 Hz 3. similar VQ expert (featural) assessment 4

3. Materials and method 3.1. Subjects – using a simplified version of the VPA scheme: 1 0 2 e.g. mandibular setting (close – neutral – open) – Similarity Matching Coefficients number of setting matches SMC= number of settings 5

3. Materials and method 3.2. Stimuli and listeners Stimuli Listeners • approximately 3 secs • 20 native Spanish speakers • from spontan. conversations - age range 22-51; mean 33 – interlocutor = controlled – same speaking style • 20 native English speakers • declarative sentences - age range 19-35; mean 25 – different ling. content - no knowledge of Spanish! – diverse neutral topics 6

3. Materials and method 3.3. Design of perceptual test • MFC Praat experiment 90 different-speaker pairings – random order • Instructions for listeners: “please rate their similarity from 1 to 5” 1 2 3 4 5 very similar very different • Test duration = 15 min (break every 30 stimuli) • Listeners were not told that the test included twin pairs! 7

3. Materials and method 3.4. Analysis methods • Multidimensional Scaling (MDS)  to visualize degree of perceived similarity  to detect meaningful dimensions that explain observed (dis)similarities • Mixed-effects modelling  to fit models to the similarity ratings — Fixed effects (predictors): — Random effects :  Listener language  Listeners  Trial  SMC between speakers in the target trial (target sp. comparison)  Reaction time  Twins – whether speakers were twins or not 8

4. Results • MDS analysis scree plot : relative magnitude of the sorted Eigenvalues stress:0.03 stress:0.07 *stress = goodness-of-fit criterion to minimize. 9 Rule of thumb: <0.1 is excellent; >0.20 is poor

4. Results • MDS plots (2D) stress: 0.8 10

4. Results • MDS plots (3D) stress: 0.4 11

4. Results • Intra-pair EDs based on 7D speakers → AGF DCT ARJ ASM AMG listeners ↓ SGF JCT JRJ RSM EMG Spanish 0.341 0.343 0.345 0.369 0.607 English 0.264 0.219 0.349 0.435 0.445 most similar most different 12

4. Results • Mixed-effects modelling – Best model  all fixed effects + interactions – Significant interactions:  Language * Reaction time  Reaction time * Twins  SMC * Twins 13

 Language * Reaction time 4. Results language*reaction_time effect plot 0 2 4 6 8 10 12 14 response : 5 response : 5 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 response : 4 response : 4 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 response (probability) response : 3 response : 3 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 response : 2 response : 2 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 response : 1 response : 1 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 0 2 4 6 8 10 12 14 14 reaction_time

 Reaction time * Twins 4. Results (language independent effects!) reaction_time*twins effect plot 0 2 4 6 8 10 12 14 response 5 : response 5 : twins No : twins : Yes 0.8 0.6 0.4 0.2 response 4 : response 4 : twins No : twins : Yes 0.8 0.6 0.4 0.2 response (probability) response 3 : response 3 : twins No : twins : Yes 0.8 0.6 0.4 0.2 response 2 : response 2 : twins No : twins : Yes 0.8 0.6 0.4 0.2 response 1 : response 1 : twins No : twins : Yes 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 15 reaction_time

 SMC * Twins 4. Results (language independent effects!) smc*twins effect plot 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 response 5 : response 5 : twins No : twins Yes : 0.8 0.6 0.4 0.2 response 4 : response 4 : twins No : twins Yes : 0.8 0.6 0.4 0.2 response (probability) response 3 : response 3 : twins No : twins Yes : 0.8 0.6 0.4 0.2 response 2 : response 2 : twins No : twins Yes : 0.8 0.6 0.4 0.2 response 1 : response 1 : twins No : twins Yes : 0.8 0.6 0.4 0.2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 16 smc

17 5. Discussion - MDS • optimal configuration = 7D space – lowest possible stress value – confirms VQ multidimensionality (Kreiman & Sidtis 2011) • f rom most similar…. …to least similar twins pairs same cue prominence? different weight?

18 5. Discussion - Mixed Effects Modelling • mostly language-independent effects – notably: twins rated as more similar than non-twins • …but one language -dependent effect: language*reaction_time effect plot 0 2 4 6 8 10 12 14 response : 5 response : 5 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 response : 4 response : 4 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 response (probability) mean. 0.84 s mean. 0.82 s response : 3 response : 3 language : English language : Spanish 0.5 0.4 sd: 0.18 sd: 0.14 0.3 0.2 0.1 response : 2 response : 2 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 response : 1 response : 1 language : English language : Spanish 0.5 0.4 0.3 0.2 0.1 0 2 4 6 8 10 12 14 reaction_time

6. Conclusions • aim  explore the role of holistic VQ perception in speaker similarity ratings • results  native ≈ non-native ratings of similarity  no native advantage - short stimuli + homogeneous population (same accent, similar age, etc.)  VQ = available resource • possible implications in earwitness testimony • future studies : - (naïve) holistic VQ perception  interrelationships between - (expert) featural VQ perception  different salience  weigthing methods 19

Thanks! Questions?

Holistic perception of voice quality matters more than L1 when - PowerPoint PPT Presentation

Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli Eugenia San Segundo, Paul Foulkes & Vincent Hughes University of York SST 2016 Parramatta, Australia 7-9 December 1. Introduction

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

S et the Bar Low. Be a WINNER every time. Public Power Matters Public Power Matters Innovation

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

GRADUATE HOLISTIC NURSING ROUND TABLE HOLDING SPACE: ADVANCED HOLISTIC NURSING CONSTRAINTS,

Enhancing Student Learning through Holistic Mentoring Program Holistic Mentoring Program Karen KW

Cooking Academy Holistic Food Preparation Cooking Academy Holistic Food Preparation Module #3

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Rational Phosphorus Rational Phosphorus Management in Biosolids Management in Biosolids

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

Overview n Perception for robotics Page 1 Overview n Perception for robotics Overview

Firm heterogeneity and trade Giovanni Marin Department of Economics, Society, Politics

Technical Meeting on Loss Factor Activities Milton Castro-Nez, Senior Engineer Loss Factor

Setup Coordination Views Export NWC (Settings) Selection and Search Sets Appearance

Transition energ etique et performance des firmes ` a lexportation: une etude

RFID Hacking Live Free or RFID Hard 24 Mar 2015 InfoSec World 2015 Orlando, FL Presen

WELCOME !! REAL OR NOT REAL Question ONE Where can donors designate? Answer Donors can

Lecture 3.6: Real vs. complex Fourier series Matthew Macauley Department of Mathematical Sciences

CS452/652 For the Kernel Assignment: Description of all major components of the system, g

Sambuz

Useful Links

Newsletter

Mail Us

Holistic perception of voice quality matters more than L1 when - PowerPoint PPT Presentation

Holistic perception of voice quality matters more than L1 when judging speaker similarity in short stimuli Eugenia San Segundo, Paul Foulkes & Vincent Hughes University of York SST 2016 Parramatta, Australia 7-9 December 1. Introduction

Slide 1 Page: 1 The Leader's Voice Slide 3 Page: 5 The Leader's Voice Slide 4 Page: 6 The

Visual Perception human perception display devices 1 CS 349 - Visual Perception Reference

DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and Digital Voice Modes DMR and

Digital Voice VHF, UHF, and HF Analog Voice - AM/SSB Analog Voice - FM Digital Voice GMSK UHF

S et the Bar Low. Be a WINNER every time. Public Power Matters Public Power Matters Innovation

MODULES AS PERCEPTUAL INPUT - SYSTEMS Language Perception Visual Auditory Perception

GRADUATE HOLISTIC NURSING ROUND TABLE HOLDING SPACE: ADVANCED HOLISTIC NURSING CONSTRAINTS,

Enhancing Student Learning through Holistic Mentoring Program Holistic Mentoring Program Karen KW

Cooking Academy Holistic Food Preparation Cooking Academy Holistic Food Preparation Module #3

Aisle Safety Light Brightness SFMTA Fleet Engineering Voice Annunciator Volume Voice

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

There is a voice speaking. That voice is sovereign. That voice alone is sovereign. Jeremiah

Rational Phosphorus Rational Phosphorus Management in Biosolids Management in Biosolids

For New Construction &amp; Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION

Infant Speech Perception LSCP Infant Lab Outline Introduction to Phonology Problem of

Overview n Perception for robotics Page 1 Overview n Perception for robotics Overview

Firm heterogeneity and trade Giovanni Marin Department of Economics, Society, Politics

Technical Meeting on Loss Factor Activities Milton Castro-Nez, Senior Engineer Loss Factor

Setup Coordination Views Export NWC (Settings) Selection and Search Sets Appearance

Transition energ etique et performance des firmes ` a lexportation: une etude

RFID Hacking Live Free or RFID Hard 24 Mar 2015 InfoSec World 2015 Orlando, FL Presen

WELCOME !! REAL OR NOT REAL Question ONE Where can donors designate? Answer Donors can

Lecture 3.6: Real vs. complex Fourier series Matthew Macauley Department of Mathematical Sciences

CS452/652 For the Kernel Assignment: Description of all major components of the system, g

Sambuz

Useful Links

Newsletter

Mail Us

For New Construction & Ship Repair PERCEPTION ESTI-MATE PERCEPTION ESTI-MATE 1 PERCEPTION