A Corpus For Large-Scale Phonetic Typology Elizabeth Salesky Eleanor Chodroff Tiago Pimentel Matthew Wiesner VoxClamantis in deserto: Ryan Cotterell Jason Eisner “a voice crying out in Alan W Black � 1 the wilderness”
‘ipeuhcan’ ‘am Anfang’ ‘ በመጀመሪያ ’ ‘in the beginning’ Nahuatl German Amharic English In the beginning, there was SPEECH Tower of Babel � 2
‘ipeuhcan’ ‘am Anfang’ ‘ በመጀመሪያ ’ ‘in the beginning’ Nahuatl German Amharic English In the beginning, there was SPEECH Then the linguist asked: We create our new corpus, VoxClamantis v1.0, to answer this question! How do speech and language vary? ✔ spoken readings of the Bible ✔ >600 languages ↳ prior cross-linguistic phonetic studies have relied on reported [language- ✔ time-aligned phonemic transcriptions aggregate] measurements ✔ phonetic measures for vowel and sibilant tokens � 3
This talk ① WHY we want this data ② HOW we create it ③ CASE STUDIES validating the corpus & illustrating two possible uses � 4
Why? � 5
⑤ ⑦ Motivation Variation in and across languages s s s s s Spanish Romanian s s s s s s /i/ /i/ s s s s /u/ s /u/ s s s s /o/ /o/ s s s s /e/ s /e/ s s s s /a/ /a/ s s s s s / ɨ / / ə / We know phonetic variation within a language, How does the number and set of phonemic but what are its range and limits? categories influence their realizations? � 6 variation
How? � 7
Resources ① speech Needed ② transcripts ③ phonemic labels ? ? ? ? Amharic b ə m ə d ʒ m ə ri ja ə Grapheme-to-Phoneme (G2P) በመጀመሪያ � 8
Resources ① speech Needed ② transcripts ③ phonemic labels ④ time alignments ⑤ phonetic measures ? ? ? ? Amharic b ə m ə d ʒ ə m ə r i j a Forced alignment በመጀመሪያ (HMM acoustic model) Phonetic measures (R or Praat): � 9 Formant frequencies, mid-frequency peak, duration…
Extraction ① speech Process ② transcripts e l b i B 9 9 6 ! s CMU Wilderness g n i d a e r (2019) with ① speech! ‘ በመጀመሪያ ’ and ② transcripts! Amharic >1TB 😲 >6 years of CPU compute 😲 � 10
Extraction ① speech Process ② transcripts CMU Wilderness dataset Chapter: ~30min 1 የፍጥረት አጀማመር በመጀመሪያ እግዚአብሔር ( ኤሎሂም ) ሰማያትንና ምድርን ፈጠረ። 2 ምድርም ቅርጽ የለሽና ባዶ ነበረች። ※ የምድርን ጥልቅ ስፍራ ሁሉ ጨለማ ውጦት ነበር። የእግዚአብሔርም ( ኤሎሂም ) መንፈስ በውሆች ላይ ይረብብ ነበር። 3 ከዚያም እግዚአብሔር ( ኤሎሂም ) “ ብርሃን ይሁን ” አለ፤ ብርሃንም ሆነ። 4 እግዚአብሔርም ( ኤሎሂም ) ብርሃኑ መልካም እንደሆነ አየ፤ ብርሃኑን ከጨለማ ለየ። 5 እግዚአብሔርም ( ኤሎሂም ) ብርሃኑን “ ቀን ” ፣ ጨለማውን “ ሌሊት ” ብሎ ጠራው። መሸ፤ ነጋም፤ የመጀመሪያ ቀን። 6 እግዚአብሔር ( ኤሎሂም ) ፣ “ ውሃን ከውሃ የሚለይ ጠፈር በውሆች መካከል ይሁን ” አለ። 7 ስለዚህ እግዚአብሔር ( ኤሎሂም ) ጠፈርን አድርጎ ከጠፈሩ በላይና ከጠፈሩ በታች ያለውን ውሃ ለየ፤ እንዳለውም ሆነ። 8 እግዚአብሔር ( ኤሎሂም ) ጠፈርን “ ሰማይ ” ብሎ ጠራው። መሸ፤ ነጋም፤ ሁለተኛ ቀን። 9 ከዚያም እግዚአብሔር ( ኤሎሂም ) ፣ “ ከሰማይ በታች ያለው ውሃ በአንድ . … Utterance: < 30s 😲 በመጀመሪያ � 11
Extraction ① speech Process ② transcripts text ③ phonemic labels Which phonemes are present? / ɹɛ t / / ɹɛ d / phonemes read read G2P / ɛ / / i / text � 12
Extraction ① speech Process ② transcripts ③ phonemic labels Phoneme “Transcriptions”—- Grapheme-to-Phoneme 39 readings ① Linguist-created rules (Epitran) 690 64 . (disjoint) 18 readings ② Wisdom of Crowds (Wiktionary/WikiPron) 690 1 6 5 + our own WFST-models (Phonetisaurus 🦖 ) . All 690 readings ③ Naïve baseline (Unitran) 690 😲 “first-pass transcription” . � 13
G2P Summary 57 readings “High-resource (HR)” 39 690 readings . “first-pass” . 18 ALL 690 readings “First-pass (FP)” 🤕 why provide FP alignments for languages with HR ? We’ll come back to that 😊 � 14
Extraction ① speech Process ② transcripts ③ phonemic labels ? ? ? ? Amharic b ə m ə d ʒ m ə ri ja ə Forced alignment (HMM acoustic model) � 15
Extraction ① speech Process ② transcripts ③ phonemic labels ④ time alignments ? ? ? ? Amharic b ə m ə d ʒ ə m ə r i j a Forced alignment b (HMM acoustic model) start end time time � 16
Extraction ① speech Process ② transcripts ③ phonemic labels ④ time alignments ? ? ? ? Amharic b ə m ə d ʒ ə m ə r i j a Forced alignment b (HMM acoustic model) start end time time � 17
Extraction ① speech Process ② transcripts ③ phonemic labels ④ time alignments Amharic Phoneme tokens: b ə b start end m time time … � 18
Extraction ① speech Process Phonetic Measures ② transcripts ③ phonemic labels ④ time alignments ⑤ phonetic measures VOWELS SIBILANTS a a o s z z F4 F3 F2 F1 Spectral peak, eg high-amplitude Formants COG, Duration, ... frequencies PRAAT TEXTGRID � 19
Evaluation 🤕 Why provide both Unitran and High-Resource alignments? Use multiple sets of alignments to assess Unitran alignment quality ‣ How much does quality vary across languages? ‣ Are certain phonemes more accurate than others? ‣ What about time alignment accuracy? See paper! (+ appendices) � 20
Corpus Summary VoxClamantis v1.0 provides tokens of phoneme- level measurements in hundreds of languages! ‣ 690 recorded readings of the Bible ‣ 635 languages (ISO 639-3) ‣ 70 language families ‣ >400 million aligned phoneme-level segments ‣ Subsequent phonetic measures for all vowels and sibilants � 21
Case Studies � 22
Case Studies Case studies with VoxClamantis v1.0 Vowels Sibilants ~50 phonemes /s/ /z/ 48 High-Resource Readings l e c a s a t R e h p r o d u c t r c i o n o f e a s R e ① s - o s c r l ② r a p r e v n e i o u s r e s e u l t s s g s t e g g s u e s p l c i n v a l i d a p r i t e s r e s o c u r c e t i i s g u i n l � 23
Phonetic Uniformity Are shared characteristics realized uniformly within languages? (eg: vowel height, POA) (eg: measures strongly correlated) Formants : Vowels Mid-Freq Peak : Sibilants /s/, /z/: alveolar /i/, /u/: high vowels place of articulation (eg: language) Supports hypothesis While variation exists across languages, that this may be a within language F1 strongly correlated universal principle Reproduce previous results, but with many more languages � 24
Phonetic Dispersion Is inventory size correlated with articulatory precision? VOWELS 4 vowels 20 vowels i i: u u: i ɪ ᵿ e o ə e ɚ ɛ ɜ : ɔ ɔ : ɛ ɒ æ æ ɑ ɑ : a: Marshallese English � 25
Phonetic Dispersion Is inventory size correlated with articulatory precision? 4 vowels 20 vowels i i: u u: i ɪ ᵿ e o ə e ɚ ɛ ɜ : ɔ ɔ : ɛ ɒ æ æ ɑ ɑ : a: Marshallese English � 26
Phonetic Dispersion Is inventory size correlated with articulatory precision? No (Spearman ρ = 0.11, p = 0.44; 4 vowels Pearson r = 0.11, p = 0.46) 20 vowels i i: u u: i ɪ ᵿ e o ə e ɚ ɛ ɜ : ɔ ɔ : ɛ ɒ æ æ ɑ ɑ : a: Marshallese English Supports hypothesis that this may [not] be a Previously shown, universal principle but not possible to study at scale � 27
N O I T U A C + Utterance alignment B Filter -- in future, realign! + D - Automatic phoneme labels A Better G(+A)2P % 0 A Alignment assessment! Curate more resources! 😲 Corpus representation Curate more resources! B (e.g. speakers) � 28
Summary � 29
Conclusion VoxClamantis v1.0 corpus: voxclamantisproject.github.io aligned phoneme-level segments in hundreds of languages 57 high-resource, 690 first-pass 😲 methodology is not perfect – version 1.0! ⬇ download 🥴 use for research ⬆ contribute to v2.0! � 30
Contact Us! ! s n o i t s e u ! s Q t n e m ! m s n o o C i t u b voxclamantisproject.github.io i r t n o C voxclamantisproject@gmail.com Elizabeth Salesky Eleanor Chodroff Tiago Pimentel Matthew Wiesner VoxClamantis in deserto: Ryan Cotterell Jason Eisner “a voice crying out in Alan W Black � 31 the wilderness”
Recommend
More recommend