language and computers
play

Language and Computers Relation to language Encoding written - PowerPoint PPT Presentation

Language and Computers Prologue: Encoding Language Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Language and Computers Relation to language Encoding written Prologue: Encoding Language language


  1. Language and Computers Prologue: Encoding Language Writing systems Alphabetic Syllabic Logographic Systems with unusual realization Language and Computers Relation to language Encoding written Prologue: Encoding Language language ASCII Unicode Spoken language L245 Transcription (Based on Dickinson, Brew, & Meurers (2013)) Why speech is hard to represent Indiana University Articulation Measuring sound Spring 2016 Acoustics Relating written and spoken language From Speech to Text From Text to Speech Language modeling 1 / 63

  2. Language and Language and Computers Computers Prologue: Encoding Language Writing systems Alphabetic Syllabic Logographic Computers have a variety of applications involving language: Systems with unusual realization Relation to language Encoding written ◮ textual searching language ASCII ◮ grammar correction Unicode Spoken language ◮ automatic translation Transcription Why speech is hard to ◮ question answering represent Articulation ◮ plagiarism detection Measuring sound Acoustics ◮ ... Relating written and spoken language From Speech to Text From Text to Speech Language modeling 2 / 63

  3. Language and Language and Computers – where to start? Computers Prologue: Encoding Language Writing systems Alphabetic Syllabic Logographic ◮ If we want to do anything with language, we need a way Systems with unusual realization to represent language. Relation to language Encoding written language ◮ We can interact with the computer in several ways: ASCII Unicode ◮ write or read text Spoken language ◮ speak or listen to speech Transcription Why speech is hard to represent ◮ Computer has to have some way to represent Articulation Measuring sound ◮ text Acoustics ◮ speech Relating written and spoken language From Speech to Text From Text to Speech Language modeling 3 / 63

  4. Language and Outline Computers Prologue: Encoding Language Writing systems Alphabetic Syllabic Writing systems Logographic Systems with unusual realization Relation to language Encoding written language Encoding written language ASCII Unicode Spoken language Spoken language Transcription Why speech is hard to Relating written and spoken language represent Articulation Measuring sound Acoustics Language modeling Relating written and spoken language From Speech to Text From Text to Speech Language modeling 4 / 63

  5. Language and Writing systems used for human languages Computers Prologue: Encoding Language What is writing? Writing systems Alphabetic “a system of more or less permanent marks used Syllabic Logographic to represent an utterance in such a way that it can Systems with unusual realization Relation to language be recovered more or less exactly without the Encoding written intervention of the utterer.” language ASCII (Peter T. Daniels, The World’s Writing Systems) Unicode Spoken language Transcription Different types of writing systems are used: Why speech is hard to represent Articulation Measuring sound ◮ Alphabetic Acoustics Relating written and ◮ Syllabic spoken language From Speech to Text ◮ Logographic From Text to Speech Language modeling Much of the information on writing systems and the graphics used are taken from the great site http://www.omniglot.com. 5 / 63

  6. Language and Alphabetic systems Computers Prologue: Encoding Language Writing systems Alphabetic Alphabets (phonemic alphabets) Syllabic Logographic Systems with unusual realization ◮ represent all sounds, i.e., consonants and vowels Relation to language Encoding written ◮ Examples: Etruscan, Latin, Korean, Cyrillic, Runic, language ASCII International Phonetic Alphabet Unicode Spoken language Transcription Abjads (consonant alphabets) Why speech is hard to represent Articulation Measuring sound ◮ represent consonants only (sometimes plus selected Acoustics Relating written and vowels; vowel diacritics generally available) spoken language ◮ Examples: Arabic, Aramaic, Hebrew From Speech to Text From Text to Speech Language modeling 6 / 63

  7. Language and Alphabet example: Fraser Computers Prologue: Encoding Language An alphabet used to write Lisu, a Tibeto-Burman language spoken by Writing systems about 657,000 people in Burma, India, Thailand and in the Chinese Alphabetic Syllabic provinces of Yunnan and Sichuan. Logographic Systems with unusual realization Relation to language Encoding written language ASCII Unicode Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics Relating written and spoken language From Speech to Text From Text to Speech Language modeling (from: http://www.omniglot.com/writing/fraser.htm) 7 / 63

  8. Language and Abjad example: Phoenician Computers Prologue: Encoding Language An abjad used to write Phoenician, created between the 18th and 17th Writing systems centuries BC; assumed to be the forerunner of the Greek and Hebrew Alphabetic Syllabic alphabet. Logographic Systems with unusual realization Relation to language Encoding written language ASCII Unicode Spoken language Transcription Why speech is hard to represent Articulation Measuring sound Acoustics Relating written and spoken language From Speech to Text From Text to Speech Language modeling (from: http://www.omniglot.com/writing/phoenician.htm) 8 / 63

  9. Language and A note on the letter-sound correspondence Computers Prologue: Encoding Language ◮ Alphabets use letters to encode sounds (consonants, Writing systems Alphabetic vowels). Syllabic Logographic Systems with unusual ◮ But the correspondence between spelling and realization Relation to language pronunciation in many languages is quite complex, i.e., Encoding written not a simple one-to-one correspondence. language ASCII Unicode ◮ Example: English Spoken language ◮ same spelling – different sounds: ough : ought , cough, Transcription Why speech is hard to represent tough, through, though, hiccough Articulation ◮ silent letters: knee , knight , knife , debt , psychology , Measuring sound Acoustics mortgage Relating written and ◮ one letter – multiple sounds: exit , use spoken language ◮ multiple letters – one sound: the , revolution From Speech to Text From Text to Speech ◮ alternate spellings: jail or gaol; but not possible seagh Language modeling for chef (despite sure, dead, laugh) 9 / 63

  10. Language and More examples for non-transparent letter-sound Computers Prologue: Encoding correspondences Language Writing systems Alphabetic Syllabic French Logographic Systems with unusual realization Relation to language (1) a. Versailles → [veRsai] Encoding written language b. ete , etais , etait , etaient → [ete] ASCII Unicode Spoken language Transcription Irish Why speech is hard to represent Articulation Measuring sound Acoustics (2) a. samhradh (summer) → [sauruh] Relating written and b. scri’obhaim (I write) → [shgri:m] spoken language From Speech to Text From Text to Speech Language modeling What is the notation used within the [] ? 10 / 63

  11. Language and The International Phonetic Alphabet (IPA) Computers Prologue: Encoding Language Writing systems Alphabetic ◮ Several special alphabets for representing sounds have Syllabic Logographic been developed, the best known being the International Systems with unusual realization Phonetic Alphabet (IPA). Relation to language Encoding written language ◮ The phonetic symbols are unambiguous: ASCII ◮ designed so that each speech sound gets its own Unicode Spoken language symbol, Transcription ◮ eliminating the need for Why speech is hard to represent ◮ multiple symbols used to represent simple sounds Articulation Measuring sound ◮ one symbol being used for multiple sounds. Acoustics Relating written and spoken language ◮ Interactive example chart: http://web.uvic.ca/ling/ From Speech to Text resources/ipa/charts/IPAlab/IPAlab.htm From Text to Speech Language modeling 11 / 63

  12. Language and Syllabic systems Computers Prologue: Encoding Language Syllabaries Writing systems Alphabetic ◮ writing systems with separate symbols for each syllable Syllabic Logographic of a language Systems with unusual realization ◮ Examples: Cherokee. Ethiopic, Cypriot, Ojibwe, Relation to language Encoding written Hiragana (Japanese) language ASCII (cf. also: http://www.omniglot.com/writing/syllabaries.htm) Unicode Spoken language Transcription Abugidas (Alphasyllabaries) Why speech is hard to represent Articulation ◮ writing systems organized into families Measuring sound Acoustics ◮ symbols represent a consonant with a vowel, but the Relating written and spoken language vowel can be changed by adding a diacritic (= a From Speech to Text From Text to Speech symbol added to the letter). Language modeling ◮ Examples: Balinese, Javanese, Tamil, Thai, Tagalog (cf. also: http://www.omniglot.com/writing/syllabic.htm) 12 / 63

Recommend


More recommend