Speech Processing 15-492/18-492 Multilinguality Dealing with all - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Multilinguality

Dealing with *all* Languages Over 6000 Languages � Over 6000 Languages � � Maybe not all commercially interesting … now Maybe not all commercially interesting … now � Major languages (economic) � Major languages (economic) � � Cell phone manufacturers list 46 languages Cell phone manufacturers list 46 languages � � But even those not all covered But even those not all covered �

What you need ASR � ASR � � Acoustic model (lots of speakers) Acoustic model (lots of speakers) � � Pronunciation Lexicon Pronunciation Lexicon � � Language model Language model � TTS � TTS � � Acoustic model (one speaker) Acoustic model (one speaker) � � Pronunciation Lexicon Pronunciation Lexicon � � Text analysis Text analysis �

Writing Systems � Romanized writing systems Romanized writing systems � � Latin Latin- -1 (iso 1 (iso- -8599 8599- -1) 1) � � Covers many Western Europeans languages Covers many Western Europeans languages � � Cyrillic Cyrillic � � Covers many Eastern European Languages Covers many Eastern European Languages � � Arabic Scripts Arabic Scripts � � Arabic(s Arabic(s), Farsi, Urdu, etc ), Farsi, Urdu, etc � � Devenagari Devenagari � � Covers many Northern India Languages Covers many Northern India Languages � � Chinese Chinese Hanzi Hanzi � � Covers some Chinese dialects but different versions Covers some Chinese dialects but different versions � � Many other scripts some non Many other scripts some non- -standard standard �

Writing Systems � Letter based Letter based � � Latin, Cyrillic Latin, Cyrillic � � Consonant based Consonant based � � Arabic, Hebrew Arabic, Hebrew � � Mora based Mora based � � Half syllable or syllable Half syllable or syllable � � Indian scripts, Japanese native scripts Indian scripts, Japanese native scripts � � Syllable based Syllable based � � Hangul, Chinese Hangul, Chinese �

Standards Writing standards � Writing standards � � Taught at schools, newspapers, computer Taught at schools, newspapers, computer � support support � Typically standardized spelling Typically standardized spelling � May be mostly spoken � May be mostly spoken � � Occasionally written Occasionally written �

Language Specific Issues � No explicit markings No explicit markings � � Stress, accent, tones Stress, accent, tones � � No word boundaries No word boundaries � � Chinese, Thai Chinese, Thai � � No (short) vowels No (short) vowels � � Arabic, Hebrew Arabic, Hebrew � � Rich morphology Rich morphology � � Many different words in the languages Many different words in the languages � � Finnish, Turkish, Greenlandic Finnish, Turkish, Greenlandic �

Genre Specific Issues No capitals, punctuations � No capitals, punctuations � Unpunctuated � Unpunctuated � Plain vs vs polite form polite form � Plain � Speech vs vs text form text form � Speech � Many foreign phrases � Many foreign phrases � � (technology directed genre’s) (technology directed genre’s) � Many new abbreviations � Many new abbreviations � � E.g. SMS messages E.g. SMS messages �

Character Encoding � Unicode Unicode vs vs utf8 utf8 vs vs latin latin � � Documents mix them Documents mix them � � Sometime accent omitted Sometime accent omitted � � For ease of typing For ease of typing � � Lots of standards Lots of standards � � Unicode, EUC, BIG5, TIS42, … Unicode, EUC, BIG5, TIS42, … � � Everyone has their own standard Everyone has their own standard � � Some create their own standards Some create their own standards � � Mixed character sets Mixed character sets �

Phoneme Sets Hard to find consensus for new languages � Hard to find consensus for new languages � � Typically lots of different dialects Typically lots of different dialects � What level of distinction? � What level of distinction? � � Some good for speech but not really phonetic Some good for speech but not really phonetic � � /t/ /t/ vs vs / /dx dx/ in “water” / in “water” � Often doesn’t include foreign phones � Often doesn’t include foreign phones � � /w/ in German is common for younger people /w/ in German is common for younger people �

Words � May be hard to define May be hard to define � � No word boundaries No word boundaries � � Rich morphology Rich morphology � � Words have many variations of compounds Words have many variations of compounds � � Yomenakatta Yomenakatta - -> could not read > could not read � � Yomemasendeshita Yomemasendeshita - -> could not read (polite) > could not read (polite) � � Gender specific speech Gender specific speech � � Boku Boku vs vs atashi atashi � � Language mixtures Language mixtures �

Pronunciation lexicons “proper” speech proper” speech vs vs “actual” speech “actual” speech � “ � Hard to generalize � Hard to generalize � � Chinese Chinese � Cross lingual pronunciations � Cross lingual pronunciations � � “Human” (English/German) “Human” (English/German) �

“Industry” way � Collect at least 100 hours of spoken speech Collect at least 100 hours of spoken speech � � At least 20 different speakers At least 20 different speakers � � Mixture of gender, age, etc Mixture of gender, age, etc � � Through desired channel (phone/desktop) Through desired channel (phone/desktop) � � Collect at least 5 hours from one speaker Collect at least 5 hours from one speaker � � High quality recording studio High quality recording studio � � Data should be targeted to application Data should be targeted to application � � Build pronunciation lexicon Build pronunciation lexicon � � Expert Expert phonologist phonologist �

Industry way Probably 3- -6 months 6 months � Probably 3 � � Lead developer Lead developer � � Local language expert Local language expert � � Lots of human transcribers Lots of human transcribers � Costs? � Costs? � � Many hundreds of thousands Many hundreds of thousands �

Or cheaper (?) … Find existing data � Find existing data � � Linguistic Data Consortium ( Linguistic Data Consortium (UPenn UPenn) ) � � ELRA (European equivalent) ELRA (European equivalent) � � Appen Appen, Australia , Australia � � Find local people who have collected data Find local people who have collected data � Found data might be in wrong format � Found data might be in wrong format � � Data cleaning is often the most expensive Data cleaning is often the most expensive �

Actual way Often mixture � Often mixture � � Found data for initial model Found data for initial model � � Collect data with actual/initial application Collect data with actual/initial application �

Multilingual Systems Support lots of different languages � Support lots of different languages � � Press 1 for Spanish Press 1 for Spanish � � Press 2 for Gujarati … Press 2 for Gujarati … � Automatically detect language � Automatically detect language � Mixed language � Mixed language �

Multilingual (Menu) Speak in your language � Speak in your language � � Eki Eki- -mai mai no no tsugi tsugi no bus no ha? no bus no ha? � � When is the next bus to the station When is the next bus to the station � Need multiple recognizers � Need multiple recognizers � � Run in parallel and take best result Run in parallel and take best result � Or shared acoustic models � Or shared acoustic models � � Recognizing both languages at once (mix) Recognizing both languages at once (mix) �

Multilingual (in line) � Code switching Code switching � � European, India, Bilingual areas European, India, Bilingual areas � � Hinglish Hinglish, , Spanglish Spanglish � � Borrowed words and phrases Borrowed words and phrases � � Dad, time Dad, time kyu kyu hua hua hai hai � � One One lakh lakh � � Computer Computer walla walla � � numbers numbers � � Can be inflected Can be inflected � � Was updated Was updated - -> up > up gedaten gedaten �

HW2: TTS Due 3:30pm Monday October 20 th th � Due 3:30pm Monday October 20 � Install Festival and Festvox Festvox � Install Festival and � Find 10 errors in each of two different � Find 10 errors in each of two different � synthesizers synthesizers Build a voice � Build a voice � � A Talking Clock A Talking Clock � � A general voice A general voice � � (or both) (or both) �

Speech Processing 15-492/18-492 Multilinguality Dealing with all - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Multilinguality Dealing with all Languages Over 6000 Languages Over 6000 Languages Maybe not all commercially interesting now Maybe not all commercially interesting now Major languages

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Main and Resit examination revision guidelines CC2005 Revision Exam paper choice of any

Q = A (O + S) health care lacks an effective management approach that W would produce:

UNIT : INTRODUCTION TO MANAGEMENT LESSON 04 : FOUNDATIONS OF ORGANIZATIONAL BEHAVIOUR

THE POLITICS OF BLOCKCHAIN From Primus Inter Pares to Peer-to-Peer HOW BLOCKCHAIN WORKS In

Numerical shape optimization for compressible flows (Minimization of expensive cost functions)

University of Kashmir Directorate of IT & SS, University of Kashmir IT & SS:

Summary of the Recommendation of National Task Force to Advise the NRHM on Strategies for Urban

Beyond the Four Walls of P radhan M antri A was Y ojna Resource efficiency, thermal comfort and

Sambuz

Useful Links

Newsletter

Mail Us

Speech Processing 15-492/18-492 Multilinguality Dealing with *all* - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Multilinguality Dealing with *all* Languages Over 6000 Languages Over 6000 Languages Maybe not all commercially interesting now Maybe not all commercially interesting now Major languages

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Spoken Dialog Systems SDS

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Processing Current Topics and Future challenges

Speech Processing 15-492/18-492 Speech Synthesis Pronunciation Letter to Sound rules Speech

Speech Processing 15-492/18-492 Computer Speech Analog to Digital Speech (sound) is analog

Speech Processing 15-492/18-492 Emotional Speech (Some slides taken form JHU Workshop 2011 final

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing 15-492/18-492 Speech Synthesis Evaluation Evaluating Speech Synthesis How

Main and Resit examination revision guidelines CC2005 Revision Exam paper choice of any

Q = A (O + S) health care lacks an effective management approach that W would produce:

UNIT : INTRODUCTION TO MANAGEMENT LESSON 04 : FOUNDATIONS OF ORGANIZATIONAL BEHAVIOUR

THE POLITICS OF BLOCKCHAIN From Primus Inter Pares to Peer-to-Peer HOW BLOCKCHAIN WORKS In

Numerical shape optimization for compressible flows (Minimization of expensive cost functions)

University of Kashmir Directorate of IT &amp; SS, University of Kashmir IT &amp; SS:

Summary of the Recommendation of National Task Force to Advise the NRHM on Strategies for Urban

Beyond the Four Walls of P radhan M antri A was Y ojna Resource efficiency, thermal comfort and

Sambuz

Useful Links

Newsletter

Mail Us

Speech Processing 15-492/18-492 Multilinguality Dealing with all - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Multilinguality Dealing with all Languages Over 6000 Languages Over 6000 Languages Maybe not all commercially interesting now Maybe not all commercially interesting now Major languages

University of Kashmir Directorate of IT & SS, University of Kashmir IT & SS: