speech processing 15 492 18 495
play

Speech Processing 15-492/18-495 Multilinguality Dealing with *all* - PowerPoint PPT Presentation

Speech Processing 15-492/18-495 Multilinguality Dealing with *all* Languages Dealing with *all* Languages Over 6000 Languages Over 6000 Languages Maybe not all commercially interesting now Maybe not all commercially interesting


  1. Speech Processing 15-492/18-495 Multilinguality

  2. Dealing with *all* Languages Dealing with *all* Languages  Over 6000 Languages Over 6000 Languages  Maybe not all commercially interesting … now Maybe not all commercially interesting … now  Major languages (economic) Major languages (economic)  Cell phone manufacturers list 46 languages Cell phone manufacturers list 46 languages  But even those not all covered But even those not all covered

  3. What you need What you need  ASR ASR  Acoustic model (lots of speakers) Acoustic model (lots of speakers)  Pronunciation Lexicon Pronunciation Lexicon  Language model Language model  TTS TTS  Acoustic model (one speaker) Acoustic model (one speaker)  Pronunciation Lexicon Pronunciation Lexicon  Text analysis Text analysis

  4. Writing Systems Writing Systems  Romanized writing systems Romanized writing systems  Latin-1 (iso-8599-1) Latin-1 (iso-8599-1)  Covers many Western Europeans languages Covers many Western Europeans languages  Cyrillic Cyrillic  Covers many Eastern European Languages Covers many Eastern European Languages  Arabic Scripts Arabic Scripts  Arabic(s), Farsi, Urdu, etc Arabic(s), Farsi, Urdu, etc  Devenagari Devenagari  Covers many Northern India Languages Covers many Northern India Languages  Chinese Hanzi Chinese Hanzi  Covers some Chinese dialects but different versions Covers some Chinese dialects but different versions  Many other scripts some non-standard Many other scripts some non-standard

  5. Writing Systems Writing Systems  Letter based Letter based  Latin, Cyrillic Latin, Cyrillic  Consonant based Consonant based  Arabic, Hebrew Arabic, Hebrew  Mora based Mora based  Half syllable or syllable Half syllable or syllable  Indian scripts, Japanese native scripts Indian scripts, Japanese native scripts  Syllable based Syllable based  Hangul, Chinese Hangul, Chinese

  6. Standards Standards  Writing standards Writing standards  Taught at schools, newspapers, computer Taught at schools, newspapers, computer support support  Typically standardized spelling Typically standardized spelling  May be mostly spoken May be mostly spoken  Occasionally written Occasionally written

  7. Language Specific Issues Language Specific Issues  No explicit markings No explicit markings  Stress, accent, tones Stress, accent, tones  No word boundaries No word boundaries  Chinese, Thai Chinese, Thai  No (short) vowels No (short) vowels  Arabic, Hebrew Arabic, Hebrew  Rich morphology Rich morphology  Many different words in the languages Many different words in the languages  Finnish, Turkish, Greenlandic Finnish, Turkish, Greenlandic

  8. Genre Specific Issues Genre Specific Issues  No capitals, punctuations No capitals, punctuations  Unpunctuated Unpunctuated  Plain vs polite form Plain vs polite form  Speech vs text form Speech vs text form  Many foreign phrases Many foreign phrases  (technology directed genre’s) (technology directed genre’s)  Many new abbreviations Many new abbreviations  E.g. SMS messages E.g. SMS messages

  9. Character Encoding Character Encoding  Unicode vs utf8 vs latin Unicode vs utf8 vs latin  Documents mix them Documents mix them  Sometime accent omitted Sometime accent omitted  For ease of typing For ease of typing  Lots of standards Lots of standards  Unicode, EUC, BIG5, TIS42, … Unicode, EUC, BIG5, TIS42, …  Everyone has their own standard Everyone has their own standard  Some create their own standards Some create their own standards  Mixed character sets Mixed character sets

  10. Phoneme Sets Phoneme Sets  Hard to find consensus for new languages Hard to find consensus for new languages  Typically lots of different dialects Typically lots of different dialects  What level of distinction? What level of distinction?  Some good for speech but not really phonetic Some good for speech but not really phonetic  /t/ vs /dx/ in “water” /t/ vs /dx/ in “water”  Often doesn’t include foreign phones Often doesn’t include foreign phones  /w/ in German is common for younger people /w/ in German is common for younger people

  11. Words Words  May be hard to define May be hard to define  No word boundaries No word boundaries  Rich morphology Rich morphology  Words have many variations of compounds Words have many variations of compounds  Yomenakatta -> could not read Yomenakatta -> could not read  Yomemasendeshita -> could not read (polite) Yomemasendeshita -> could not read (polite)  Gender specific speech Gender specific speech  Boku vs atashi Boku vs atashi  Language mixtures Language mixtures

  12. Pronunciation lexicons Pronunciation lexicons  “ “proper” speech vs “actual” speech proper” speech vs “actual” speech  Hard to generalize Hard to generalize  Chinese Chinese  Cross lingual pronunciations Cross lingual pronunciations  “ “Human” (English/German) Human” (English/German)

  13. “Industry” way Industry” way “  Collect at least 300 hours of spoken speech Collect at least 300 hours of spoken speech  At least 20 different speakers At least 20 different speakers  Mixture of gender, age, etc Mixture of gender, age, etc  Through desired channel (phone/desktop) Through desired channel (phone/desktop)  Collect at least 5 hours from one speaker Collect at least 5 hours from one speaker  High quality recording studio High quality recording studio  Data should be targeted to application Data should be targeted to application  Build pronunciation lexicon Build pronunciation lexicon  Expert phonologist Expert phonologist

  14. Industry way Industry way  Probably 3-6 months Probably 3-6 months  Lead developer Lead developer  Local language expert Local language expert  Lots of human transcribers Lots of human transcribers  Costs? Costs?  Many hundreds of thousands Many hundreds of thousands

  15. Or cheaper (?) … Or cheaper (?) …  Find existing data Find existing data  Linguistic Data Consortium (UPenn) Linguistic Data Consortium (UPenn)  ELRA (European equivalent) ELRA (European equivalent)  Appen, Australia Appen, Australia  Find local people who have collected data Find local people who have collected data  Found data might be in wrong format Found data might be in wrong format  Data cleaning is often the most expensive Data cleaning is often the most expensive

  16. Standardized Datasets Standardized Datasets  Global Phone Global Phone – 20+ languages, for ASR/TTS 20+ languages, for ASR/TTS LDC/DARPA/IARPA sets LDC/DARPA/IARPA sets  – Mostly English, Arabic and Chinese Mostly English, Arabic and Chinese BABEL dataset BABEL dataset  – 35 low resource languages (telephone conversations) 35 low resource languages (telephone conversations) Librivox Librivox  – Audio books Audio books Voxforge Voxforge  – Open source collected languages Open source collected languages Mozilla Mozilla  – Open source multilingual sets Open source multilingual sets

  17. CMU Wilderness Dataset CMU Wilderness Dataset  500+ Languages 500+ Languages – 20 hours aligned for each language 20 hours aligned for each language – Single speaker Single speaker – Mined from read audio books (Bible) Mined from read audio books (Bible) – 20+ languages, for ASR/TTS 20+ languages, for ASR/TTS

  18. Actual way Actual way  Often mixture Often mixture  Found data for initial model Found data for initial model  Collect data with actual/initial application Collect data with actual/initial application

  19. Multilingual Systems Multilingual Systems  Support lots of different languages Support lots of different languages  Press 1 for Spanish Press 1 for Spanish  Press 2 for Gujarati … Press 2 for Gujarati …  Automatically detect language Automatically detect language  Mixed language Mixed language

  20. Multilingual (Menu) Multilingual (Menu)  Speak in your language Speak in your language  Eki-mai no tsugi no bus no ha? Eki-mai no tsugi no bus no ha?  When is the next bus to the station When is the next bus to the station  Need multiple recognizers Need multiple recognizers  Run in parallel and take best result Run in parallel and take best result  Or shared acoustic models Or shared acoustic models  Recognizing both languages at once (mix) Recognizing both languages at once (mix)

  21. Multilingual (in line) Multilingual (in line)  Code switching Code switching  European, India, Bilingual areas European, India, Bilingual areas  Hinglish, Spanglish Hinglish, Spanglish  Borrowed words and phrases Borrowed words and phrases  Dad, time kyu hua hai Dad, time kyu hua hai  One lakh One lakh  Computer walla Computer walla  numbers numbers  Can be inflected Can be inflected  Was updated -> up gedaten Was updated -> up gedaten

  22. Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Multilinguality SPICE: making it easier

Recommend


More recommend