pronunciation of nouns in pronunciation of nouns in text
play

Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech - PowerPoint PPT Presentation

Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech systems Text to Speech systems Veera Raghavendra, Lavanya Prahallad Veera Raghavendra, Lavanya Prahallad IIIT Hyderabad, India IIIT Hyderabad, India Agenda Agenda Nature of


  1. Pronunciation of Nouns in Pronunciation of Nouns in Text to Speech systems Text to Speech systems Veera Raghavendra, Lavanya Prahallad Veera Raghavendra, Lavanya Prahallad IIIT Hyderabad, India IIIT Hyderabad, India

  2. Agenda Agenda Nature of Indian Language Scripts • Nature of Indian Language Scripts Convergence and Divergence • Convergence and Divergence Fonts and Transliteration Scheme • Fonts and Transliteration Scheme SSML Extensions for Proper Nouns • SSML Extensions for Proper Nouns 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  3. Nature of Indian Language Scripts Nature of Indian Language Scripts • Indian language (IL) scripts originated from the ancient Brahmi script. Basic units of the writing system are Aksharas • Basic units of the writing system are Aksharas An Akshara is an orthographic representation of a speech • An Akshara is an orthographic representation of a speech sound sound Akshara is syllabic in nature • Akshara is syllabic in nature A syllable is defined as C*VC* • A syllable is defined as C*VC* C is a consonant • C is a consonant V is a vowel • V is a vowel • Examples: V, CV, CCV, CVC, CCCV Examples: V, CV, CCV, CVC, CCCV amma: • amma: Phone sequence: / a/ / m/ / m/ / aa/ • Phone sequence: / a/ / m/ / m/ / aa/ Syllables: (/ a/ ) (/ m/ / m/ / aa/ ) • Syllables: (/ a/ ) (/ m/ / m/ / aa/ ) Written from left- to- right • • Words are separated by space as in European languages Roman digits (0...9) are used as numerals. • 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  4. Convergence and Divergence Convergence and Divergence India is a multi- lingual nation with 21 recognized India is a multi- lingual nation with 21 recognized • official languages and ~1652 dialects. official languages and ~1652 dialects. These languages are: Assamese, Tamil, Malayalam, • These languages are: Assamese, Tamil, Malayalam, Gujarati, Telugu, Oriya, Urdu, Bengali, Sanskrit, Gujarati, Telugu, Oriya, Urdu, Bengali, Sanskrit, Kashmiri, Sindhi, Punjabi, Konkani, Marathi, Kashmiri, Sindhi, Punjabi, Konkani, Marathi, Manipuri, Kannadam, Bodo, Dogri, Maithili, Santhali Manipuri, Kannadam, Bodo, Dogri, Maithili, Santhali and Nepali. and Nepali. Apart from Hindi and English • Apart from Hindi and English While all of these languages share a common • While all of these languages share a common phonetic base, some of the languages such as phonetic base, some of the languages such as Hindi, Marathi and Nepali also share a common Hindi, Marathi and Nepali also share a common script known as Devanagari. script known as Devanagari. Languages such as Telugu,Kannada and Tamil have • Languages such as Telugu,Kannada and Tamil have their own scripts. their own scripts. 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  5. Fonts and Transliteration scheme Fonts and Transliteration scheme True Type Fonts • True Type Fonts Uses 1- 256 ASCII characters to represent characters • Uses 1- 256 ASCII characters to represent characters Character representation is different from one font to • Character representation is different from one font to other [even in the same language] other [even in the same language] Separate converter required for each font • Separate converter required for each font Proprietary fonts • Proprietary fonts Unicode • Unicode A universal character set • provides a unique number for each character in a • provides a unique number for each character in a language language Supports all platforms Supports all platforms • Supports all the languages • Supports all the languages 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  6. • Transliteration (OM / IT3) • Developed by IISc Bangalore and Carnegie Mellon Developed by IISc Bangalore and Carnegie Mellon • Developed from the user readability aspects – Developed from the user readability aspects – Easier to read and type Easier to read and type • It is case- insensitive. It is case- insensitive. • Thus a single transliteration scheme is used for Thus a single transliteration scheme is used for all the Indian languages, as they share the same all the Indian languages, as they share the same set of sounds. set of sounds. • Each character (corresponding to a Each character (corresponding to a phone/ sound) is not more than three letters phone/ sound) is not more than three letters length. length. 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  7. Reference: http:/ / speech.iiit.ac.in/ Transliteration/ http:/ / www.cs.cmu.edu/ ~madhavi/ Om/ Hindi Telugu 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  8. Particles Particles Hindi and some other Indian languages have a • practice of adding a particle 'ji' or 'saaheba‘ etc., after proper nouns. They are added when the speaker wants to give • respect to the person he is referring to in his speech. Examples: Huma maasat’arajii sei milnei gayei • Huma maasat’arajii sei milnei gayei (We went to meet the teacher) (We went to meet the teacher) Aaja pitaajii ghara para rahein’gei • Aaja pitaajii ghara para rahein’gei (Father will be at house today) (Father will be at house today) 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  9. Example of Particle Example of Particle < ?xml version= "1.0"?> • < speak version= "1.0" xml:lang= “hin- in“ xml:type= “IT3”> < voice gender= "female"> Huma Huma < particle type= “ji”> maastaar< / particle> sei milnei gayei sei milnei gayei < / voice> < / speak> 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  10. Use of Loanword Use of Loanword A loanword loanword (or (or loan word ) is a word directly taken • A loan word ) is a word directly taken into one language from another with little or no into one language from another with little or no translation. translation. Informal experiments suggested 33% of errors of • Informal experiments suggested 33% of errors of TTS of IL occur while rendering loan words TTS of IL occur while rendering loan words Such loan words could be automatically detected • Such loan words could be automatically detected due to syllabic properties of the Indian languages due to syllabic properties of the Indian languages 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  11. Example of loanword CANCER has to be pronounced as / C/ / AE/ / N/ / S/ • / A/ / R/ / AE/ phoneme does not exist in Indian language • phone set < loan> kaansar < / loan> • loan (non- native) words could be rendered using • different pronunciation dictionaries or letter- to- sound rules 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  12. Use of Mention Use of Mention What is mention • What is mention I mention – refers to first occurrence of a noun • I mention – refers to first occurrence of a noun II mention – refers to second occurrence of a noun • II mention – refers to second occurrence of a noun More emphasize on the first occurrence of the • More emphasize on the first occurrence of the proper noun in a sentence or paragraph proper noun in a sentence or paragraph Tag, < mention> , should be used to identify similar • Tag, < mention> , should be used to identify similar words in synthesizing the speech words in synthesizing the speech 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  13. Duration prediction using Mention Duration prediction using Mention Information Information  Duration modeling using mention information of US Duration modeling using mention information of US English English RMSE Correlation Without 0.876 0.4580 MENTION With 0.869 0.497 MENTION 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  14. Example of Mention Example of Mention < ?xml version= "1.0"?> • < speak version= "1.0“> < voice gender= "female"> < mention occ= 1> Gandhi< / mention> was a major political and spiritual leader of the Indian Independence Movement. < mention occ= 2> Gandhi < / mention> was the pioneer of satyagraha < / mention> < / voice> < / speak> 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

  15. Conclusion Conclusion Issues in Indian scripts are discussed • Issues in Indian scripts are discussed Discussed the usage of < particle> , < loan> and • Discussed the usage of < particle> , < loan> and < mention> extensions for SSML < mention> extensions for SSML 07/ 01/ 14 07/ 01/ 14 Veera Raghavendra and Lavanya Prahallad. II Veera Raghavendra and Lavanya Prahallad. II

Recommend


More recommend