a corpus for large scale phonetic typology
play

A Corpus For Large-Scale Phonetic Typology Elizabeth Salesky - PowerPoint PPT Presentation

A Corpus For Large-Scale Phonetic Typology Elizabeth Salesky Eleanor Chodroff Tiago Pimentel Matthew Wiesner VoxClamantis in deserto: Ryan Cotterell Jason Eisner a voice crying out in Alan W Black 1 the wilderness


  1. A Corpus For Large-Scale Phonetic Typology Elizabeth Salesky Eleanor Chodroff Tiago Pimentel Matthew Wiesner VoxClamantis in deserto: Ryan Cotterell Jason Eisner “a voice crying out in 
 Alan W Black � 1 the wilderness”

  2. ‘ipeuhcan’ 
 ‘am Anfang’ 
 ‘ በመጀመሪያ ’ 
 ‘in the beginning’ 
 Nahuatl German Amharic English In the beginning, there was SPEECH Tower of Babel � 2

  3. ‘ipeuhcan’ 
 ‘am Anfang’ 
 ‘ በመጀመሪያ ’ 
 ‘in the beginning’ 
 Nahuatl German Amharic English In the beginning, there was SPEECH Then the linguist asked: We create our new corpus, VoxClamantis v1.0, 
 
 to answer this question! How do speech and language vary? ✔ spoken readings of the Bible ✔ >600 languages ↳ prior cross-linguistic phonetic studies have relied on reported [language- ✔ time-aligned phonemic transcriptions aggregate] measurements ✔ phonetic measures for vowel and sibilant tokens � 3

  4. This talk ① WHY we want this data ② HOW we create it ③ CASE STUDIES validating the corpus & illustrating two possible uses � 4

  5. Why? � 5

  6. ⑤ 
 ⑦ 
 Motivation Variation in and across languages s s s s s Spanish Romanian s s s s s s /i/ /i/ s s s s /u/ s /u/ s s s s /o/ /o/ s s s s /e/ s /e/ s s s s /a/ /a/ s s s s s / ɨ / / ə / We know phonetic variation within a language, 
 How does the number and set of phonemic 
 but what are its range and limits? categories influence their realizations? � 6 variation

  7. How? � 7

  8. Resources ① speech Needed ② transcripts ③ phonemic labels ? ? ? ? Amharic b ə m ə d ʒ m ə ri ja ə Grapheme-to-Phoneme (G2P) በመጀመሪያ � 8

  9. Resources ① speech Needed ② transcripts ③ phonemic labels ④ time alignments ⑤ phonetic measures ? ? ? ? Amharic b ə m ə d ʒ ə m ə r i j a Forced alignment በመጀመሪያ (HMM acoustic model) Phonetic measures (R or Praat): � 9 Formant frequencies, mid-frequency peak, duration…

  10. Extraction ① speech Process ② transcripts e l b i B 9 9 6 ! s CMU Wilderness g n i d a e r (2019) with ① speech! ‘ በመጀመሪያ ’ 
 and ② transcripts! Amharic >1TB 😲 >6 years of CPU compute 😲 � 10

  11. Extraction ① speech Process ② transcripts CMU Wilderness dataset Chapter: ~30min 1 የፍጥረት አጀማመር በመጀመሪያ እግዚአብሔር ( ኤሎሂም ) ሰማያትንና ምድርን ፈጠረ። 2 ምድርም ቅርጽ የለሽና ባዶ ነበረች። ※ የምድርን ጥልቅ ስፍራ ሁሉ ጨለማ ውጦት ነበር። የእግዚአብሔርም ( ኤሎሂም ) መንፈስ በውሆች ላይ ይረብብ ነበር። 3 ከዚያም እግዚአብሔር ( ኤሎሂም ) “ ብርሃን ይሁን ” አለ፤ ብርሃንም ሆነ። 4 እግዚአብሔርም ( ኤሎሂም ) ብርሃኑ መልካም እንደሆነ አየ፤ ብርሃኑን ከጨለማ ለየ። 5 እግዚአብሔርም ( ኤሎሂም ) ብርሃኑን “ ቀን ” ፣ ጨለማውን “ ሌሊት ” ብሎ ጠራው። መሸ፤ ነጋም፤ የመጀመሪያ ቀን። 6 እግዚአብሔር ( ኤሎሂም ) ፣ “ ውሃን ከውሃ የሚለይ ጠፈር በውሆች መካከል ይሁን ” አለ። 7 ስለዚህ እግዚአብሔር ( ኤሎሂም ) ጠፈርን አድርጎ ከጠፈሩ በላይና ከጠፈሩ በታች ያለውን ውሃ ለየ፤ እንዳለውም ሆነ። 8 እግዚአብሔር ( ኤሎሂም ) ጠፈርን “ ሰማይ ” ብሎ ጠራው። መሸ፤ ነጋም፤ ሁለተኛ ቀን። 9 ከዚያም እግዚአብሔር ( ኤሎሂም ) ፣ “ ከሰማይ በታች ያለው ውሃ በአንድ . 
 … Utterance: < 30s 😲 በመጀመሪያ � 11

  12. Extraction ① speech Process ② transcripts text ③ phonemic labels Which phonemes are present? / ɹɛ t / / ɹɛ d / phonemes read 
 read 
 G2P / ɛ / / i / text � 12

  13. Extraction ① speech Process ② transcripts ③ phonemic labels Phoneme “Transcriptions”—- Grapheme-to-Phoneme 39 readings ① Linguist-created rules (Epitran) 690 64 . (disjoint) 18 readings ② Wisdom of Crowds (Wiktionary/WikiPron) 
 690 1 6 5 + our own WFST-models (Phonetisaurus 🦖 ) . All 690 readings ③ Naïve baseline (Unitran) 690 😲 “first-pass transcription” . � 13

  14. G2P Summary 57 readings 
 “High-resource (HR)” 39 690 readings . “first-pass” . 18 ALL 690 readings 
 “First-pass (FP)” 🤕 why provide FP alignments for languages with HR ? We’ll come back to that 😊 � 14

  15. Extraction ① speech Process ② transcripts ③ phonemic labels ? ? ? ? Amharic b ə m ə d ʒ m ə ri ja ə Forced alignment (HMM acoustic model) � 15

  16. Extraction ① speech Process ② transcripts ③ phonemic labels ④ time alignments ? ? ? ? Amharic b ə m ə d ʒ ə m ə r i j a Forced alignment b (HMM acoustic model) start end time time � 16

  17. Extraction ① speech Process ② transcripts ③ phonemic labels ④ time alignments ? ? ? ? Amharic b ə m ə d ʒ ə m ə r i j a Forced alignment b (HMM acoustic model) start end time time � 17

  18. Extraction ① speech Process ② transcripts ③ phonemic labels ④ time alignments Amharic Phoneme tokens: b ə b start end m time time … � 18

  19. Extraction ① speech Process Phonetic Measures ② transcripts ③ phonemic labels ④ time alignments ⑤ phonetic measures VOWELS SIBILANTS a a o s z z F4 F3 F2 F1 Spectral peak, 
 eg high-amplitude 
 Formants COG, Duration, ... frequencies PRAAT TEXTGRID � 19

  20. Evaluation 🤕 Why provide both Unitran and High-Resource alignments? Use multiple sets of alignments to assess Unitran alignment quality ‣ How much does quality vary across languages? ‣ Are certain phonemes more accurate than others? ‣ What about time alignment accuracy? See paper! (+ appendices) � 20

  21. Corpus Summary VoxClamantis v1.0 provides tokens of phoneme- level measurements in hundreds of languages! ‣ 690 recorded readings of the Bible ‣ 635 languages (ISO 639-3) ‣ 70 language families ‣ >400 million aligned phoneme-level segments ‣ Subsequent phonetic measures for all vowels and sibilants � 21

  22. Case Studies � 22

  23. Case Studies Case studies with VoxClamantis v1.0 Vowels 
 Sibilants 
 ~50 phonemes /s/ /z/ 48 High-Resource Readings l e c a s a t R e h p r o d u c t r c i o n o f e a s R e ① s - o s c r l ② r a p r e v n e i o u s r e s e u l t s s g s t e g g s u e s p l c i n v a l i d a p r i t e s r e s o c u r c e t i i s g u i n l � 23

  24. Phonetic Uniformity Are shared characteristics realized uniformly within languages? (eg: vowel height, POA) (eg: measures strongly correlated) Formants : Vowels Mid-Freq Peak : Sibilants /s/, /z/: alveolar 
 /i/, /u/: high vowels place of articulation (eg: language) Supports hypothesis While variation exists across languages, 
 that this may be a 
 within language F1 strongly correlated universal principle Reproduce previous results, 
 but with many more languages � 24

  25. Phonetic Dispersion Is inventory size correlated with articulatory precision? VOWELS 4 vowels 20 vowels i i: u u: i ɪ ᵿ e o ə e ɚ ɛ ɜ : ɔ ɔ : ɛ ɒ æ æ ɑ ɑ : a: Marshallese  English  � 25

  26. Phonetic Dispersion Is inventory size correlated with articulatory precision? 4 vowels 20 vowels i i: u u: i ɪ ᵿ e o ə e ɚ ɛ ɜ : ɔ ɔ : ɛ ɒ æ æ ɑ ɑ : a: Marshallese  English  � 26

  27. Phonetic Dispersion Is inventory size correlated with articulatory precision? No (Spearman ρ = 0.11, p = 0.44; 
 4 vowels Pearson r = 0.11, p = 0.46) 20 vowels i i: u u: i ɪ ᵿ e o ə e ɚ ɛ ɜ : ɔ ɔ : ɛ ɒ æ æ ɑ ɑ : a: Marshallese  English  Supports hypothesis that this may [not] be a 
 Previously shown, 
 universal principle but not possible to study at scale � 27

  28. N O I T U A C + Utterance alignment B Filter -- in future, realign! + D - Automatic phoneme labels A Better G(+A)2P 
 % 0 A Alignment assessment! Curate more resources! 😲 Corpus representation 
 Curate more resources! B (e.g. speakers) � 28

  29. Summary � 29

  30. Conclusion VoxClamantis v1.0 corpus: voxclamantisproject.github.io aligned phoneme-level segments in hundreds of languages 
 57 high-resource, 690 first-pass 😲 methodology is not perfect – version 1.0! ⬇ download 🥴 use for research ⬆ contribute to v2.0! � 30

  31. Contact Us! ! s n o i t s e u ! s Q t n e m ! m s n o o C i t u b voxclamantisproject.github.io i r t n o C voxclamantisproject@gmail.com Elizabeth Salesky Eleanor Chodroff Tiago Pimentel Matthew Wiesner VoxClamantis in deserto: Ryan Cotterell Jason Eisner “a voice crying out in 
 Alan W Black � 31 the wilderness”

Recommend


More recommend