F0 of Adolescent Speakers First Results for the German Ph@ttSessionz Database Chr. Draxler, F. Schiel, T. Ellbogen BAS Bavarian Archive of Speech Signals University of Munich, Germany
Introduction • previous f0 studies of adolescents • small numbers of speakers • limited and artificial speech material, e.g. sustained vowels • no speech data available • forensic databases • not available for German
Ph@ttSessionz: Goals • 1000 speakers • 50% male, 50% female (±5%) • 13-19 years • good dialect coverage • recorded via Internet in secondary schools • 22.05 kHz, 16 bit linear PCM, stereo
Session Contents item # item # isolated digit 10 date 3 numbers 11-100 19 time 3 PC command phrases 12 directory assistance 9 telephone numbers 13 spelling 10 mobile phone keys 3 phonetically rich 30 credit card 3 spontaneous 5 PIN 3 narrative 2
Session Contents item # item # isolated digit 10 date 3 numbers 11-100 19 time 3 PC command phrases 12 directory assistance 9 telephone numbers 13 spelling 10 mobile phone keys 3 phonetically rich 30 credit card 3 spontaneous 5 PIN 3 narrative 2 • SpeechDat and RVG-I compatible
Speaker Data • date of birth, sex, weight, height • dialect region (federal state at age 6) • mother tongue of speaker and family • smoking habits, dental braces, piercings
F0 Analysis • pre-release version of the database • 762 speakers • ~ 49% f, 51% m • good age distribution • biased dialect region distribution • 90829 utterances
F0 Calculation • Praat built-in algorithm • frequency 75-400 Hz • max candidates 15 • silence/voicing threshold 0.03/0.45 • octave/jump/voiced cost 0.01/0.35/0.14 • f0 mean, min, max (in Hz and mel)
F0 mean vs. Age 250,00 200,00 150,00 100,00 50,00 0,00 13 14 15 16 17 18 19 m f
F0 vs. BMI mean f0 vs. BMI (female) mean f0 vs. BMI (male) 350,00 350,00 300,00 300,00 250,00 250,00 200,00 200,00 Hz Hz 150,00 150,00 100,00 100,00 50,00 50,00 0,00 0,00 0,00 10,00 20,00 30,00 40,00 0,00 10,00 20,00 30,00 40,00 BMI BMI
F0 Data f0 single digit f 400,00 350,00 300,00 250,00 200,00 150,00 100,00 50,00 0,00 13 14 15 16 17 18 19 f0 min f0 max f0 mean
F0 Data f0 single digit f f0 single digit m 400,00 400,00 350,00 350,00 300,00 300,00 250,00 250,00 200,00 200,00 150,00 150,00 100,00 100,00 50,00 50,00 0,00 0,00 13 14 15 16 17 18 19 13 14 15 16 17 18 19 min f0 max f0 mean f0 f0 min f0 max f0 mean
F0 Data f0 single digit f f0 single digit m 400,00 400,00 350,00 350,00 300,00 300,00 250,00 250,00 200,00 200,00 150,00 150,00 100,00 100,00 50,00 50,00 0,00 0,00 13 14 15 16 17 18 19 13 14 15 16 17 18 19 min f0 max f0 mean f0 f0 min f0 max f0 mean f0 spelling geographical name m f0 spelling geographical name f 400,00 400,00 350,00 350,00 300,00 300,00 250,00 250,00 200,00 200,00 150,00 150,00 100,00 100,00 50,00 50,00 0,00 0,00 13 14 15 16 17 18 19 13 14 15 16 17 18 19 min f0 max f0 mean f0 f0 min f0 max f0 mean
F0 Range • F0 abs = F0 max - F0 min • F0 rel = F0 max / F0 min • scale • absolute Hz scale • perception-based mel scale
0,00 0,50 1,00 1,50 2,00 2,50 3,00 3,50 digit n. geographical number n. company n. person command time F0 rel mel PIN code date sentence telephone sp. geographical sp. arbitrary mobile keys sp. person credit card short text long production
Outlook • use final release of the database • 864 speakers • refine analysis • re-compute F0 for phrases
Summary • Ph@ttSessionz database • largest database for adolescent speakers • technology development and research • statistically reliable voice data for German • F0 variation dependent on utterance class
Summary • Ph@ttSessionz database • largest database for adolescent speakers • technology development and research • statistically reliable voice data for German • F0 variation dependent on utterance class?
Recommend
More recommend