Spoken Language Structure Hsin-min Wang References: - X. Huang et - PowerPoint PPT Presentation

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2

Human Speech Communication � Spoken language is used to communicate information from a speaker to a listener. Speech production and perception ( 知覺 ) are both important components of the speech chains – Speech begins with a thought and intent to communicate in the brain, which activates muscular ( 肌肉的 ) movements to produce speech sounds – A listener receives it in the auditory system ( 聽覺系統 ), processing it for conversion to neurological signals ( 神經邏輯信號 ) the brain can understand – The speaker continuously monitors and controls the vocal organs ( 發聲器官 ) by receiving his or her own speech as feedback 2

Components of Human Speech Communication Speech Generation Speech Understanding Application Semantics, Message Formulation Message Comprehension Actions M Phone, Word, Language System Language System Prosody W Feature Extraction Neuromuscular Mapping Neural Transduction Articulatory A Parameter Vocal Tract System Cochlea ( 耳蝸 ) Motion Sound Speech Analysis Speech Generation 3

Speech Generation � Message Formulation : creates the concept (message) to be expressed � Language System : converts the message into a sequence of words – the pronunciation of the words (i.e., the phoneme sequence) – the prosodic pattern: duration of each phoneme, intonation( 語調 ) of the sentence, and loudness of the sounds � Neuromuscular ( 神經肌肉 ) Mapping : perform articulatory ( 發聲的 ) mapping to control the vocal cords ( 聲帶 ), lips ( 唇 ), jaw ( 顎 ), tongue ( 舌 ) and velum ( 軟顎 ) to produce the sound sequence 4

Speech Generation - Explanations � 首先要整理自己的思想，決定要說的訊息內容 ( Message Formulation ) � 把它們變為適當的語言形式，選擇適當的詞彙，按照某種語言的法則，組成詞句，以表達想說的訊息內容 ( 遣詞造句 ) ( Language System ) � 以生理神經式衝動的形式，沿運動神經傳播到聲帶、舌唇等器官的肌肉，驅動這些肌肉運動 ( Neuromuscular Mapping ) � 空氣發生壓力變化，經過聲腔的調節，從而產生出通常的語言聲波 ( Vocal Tract System ) 5

Speech Understanding � Cochlea ( 耳蝸 ) Motion : the signal is passed to the cochlea in the inner ear, which performs the frequency analysis as a filter bank � Neural Transduction ( 神經傳導 ) : converts the spectral signal into activity signals on the auditory nerve, corresponding roughly to a feature extraction component It’s unclear how neural activity is mapped into the language system and how message comprehension ( 理解 ) is achieved in the brain 6

From Sound to Phonetics and Phonology � Speech signals are composed of analog sound patterns that serve as the basis for a discrete, symbolic representation of the spoken language – phonemes ( 音素 ), syllables ( 音節 ) and words ( 詞 ) � The production and interpretation of these sounds are governed by the syntax ( 語法 ) and semantics ( 語意 ) of the language spoken � We will take a button-up approach to introduce the basic concepts from sound to phonetics ( 語音學 ) and phonology ( 音韻學 ) – Syllables and words are followed by syntax and semantics, which form the structure of spoken language processing � Contents of this part: – Sound and Human speech systems (Speech Production and Perception) – Phonetics and Phonology – Characteristics of the Chinese Language 7

Sound � Sound is a longitudinal ( 縱向的 ) pressure wave formed of compressions ( 壓縮 ) and rarefactions ( 稀疏 ) of air molecules ( 微粒 ), in a direction parallel to that of the application of energy � Compressions are zones where air molecules have been forced by the application of energy into a tighter-than- usual configuration � Rarefactions are zones where air molecules are less tightly packed 8

Sound (cont.) � The alternating configurations of compression and rarefaction of air molecules along the path of an energy source are sometimes described by the graph of a sine wave � The use of the sine graph is only a notational convenience for charting local pressure variations over time maximal rarefaction maximal compression crest trough 9

Measures of Sound � Amplitude is related to the degree of displacement of the molecules from their resting position – Measured on a logarithmic scale in decibels (dB, 分貝 ) – A decibel scale is a means for comparing the intensity ( 強度 ) of two sounds: ( ) 10 log I / I . I , I are two intensity levels 10 0 0 – The intensity is proportional to the square of the sound pressure P. The Sound Pressure Level (SPL) is a measure of the absolute sound pressure P in dB ( ) ( ) = SPL dB 20 log P / P 10 0 – The reference 0 dB corresponds to the threshold of hearing, which is P 0 =0.0002 μ bar for a tone of 1KHz • e.g. speech conversation level at 3 feet is about 60dB SPL, a jackhammer’s level is about 120 db SPL 手提鑿岩機 10

Measures of Sound (cont.) � Absolute threshold of hearing: the maximum amount of energy of a pure tone that cannot be detected by a listener in a noise free environment * 音爆 * 砲口 11

Speech Production – Articulation � Speech is produced by air-pressure waves emanating ( 發出 ) from the mouth and the nostrils ( 鼻孔 ) of a speaker � The inventory ( 清單 ) of phonemes ( 音素 ), the basic units of speech, can be split into two classes – Consonants ( 子音 ): articulated ( 發音 ) in the presence of constrictions ( 壓縮 ) in the throat or obstructions ( 阻礙 ) in the mouth (tongue, teeth, lips) as we speak – Vowels ( 母音 / 元音 ): articulated without major constrictions and obstructions 12

Speech Production – Articulation (cont.) � Lungs ( 肺 ): source of air during speech � Vocal cords ( larynx , 喉頭 ): when the vocal folds ( 聲帶 ) are held close together and oscillate one another during a speech sound, the sound is voiced, when the folds are too slack or tense to vibrate periodically, the sound is unvoiced. � Soft palate ( velum , 軟顎 ):allow passage of air through the nasal cavity (m,n) � Hard palate ( 硬顎 ): tongue placed on it to produce certain consonants � Teeth : braces ( 支撐 ) the tongue for certain consonants � Lips : round or spread to affect vowel quality, closed completely to stop the Rounded vowels: / u / oral air flow for certain consonants Spread -> / i / ( p,b,m ) 13

Speech Production - The Voicing Mechanism � Voiced sounds – Including vowels, have a roughly regular pattern in both time and frequency structures that voiceless sounds lack – Have more energy – When the vocal folds vibrate during phoneme articulation, the phoneme is considered voiced; otherwise it is unvoiced • Vocal folds’ vibration (60Hz (man) ~ 300 Hz (woman or child)) – The distinct vowel timbres (of a sound of an instrument, 音質 / 音色 ) are created by using the tongue and lips to shape the main oral resonance cavity in different ways 14

Speech Production - The Voicing Mechanism (cont.) � Voiced sounds (cont.) – The rate of cycling (opening and closing) of the vocal folds in the larynx during phonation of voiced sounds is called the fundamental frequency ( 基頻 ) • The fundamental frequency contributes more than any other single factor to the perception of pitch in speech • Use in tone recognition of tonal languages (e.g. Chinese) or as a measure of speaker identity or authenticity Fundamental frequency ~120Hz (1/8ms) 15

Speech Production - Pitch 細微的心理聲學家 16

Speech Production - Spectrogram Spectral analysis at a single time-point A short-term frequency analysis The darkness or lightness of a band indicates the relative amplitude or energy present at a given frequency The dark horizontal bands show the formants, which are the fundamental at natural resonances of the vocal tract cavity position 17

Speech Production - Formants � The resonances ( 共振 / 共鳴 ) of the cavities that are typical of particular articulator configurations (e.g. the different vowel timbres) are called formats ( 共振峰 ) 18

Explanations for Speech Production 人的發音器官可分三大部分 � 動力器官：呼吸系統及肌肉 – 呼吸運動肌肉良好的協調運動將胸腔中的空氣以穩定的壓力推出通過喉部，形成了發聲時的動力來源。 � 發聲器官：喉部的聲帶 – 喉部的聲帶經由喉內肌及周圍的其他肌肉共同的作用，形成特定的聲門組態。當動力源的氣流通過聲門時，帶動了柔軟的聲帶黏膜產生波動。根據此時的聲門組態，聲帶黏膜會產生特定頻率及形態的波動，使得通過的氣流受到規律的阻隔，產生了空氣的疏密波。 – 聲帶是發聲體本身，為語音提供主要的聲源。聲帶振動產生的一系列的脈衝 (impulses) ，是一種週期波，其頻譜含有大量的諧波 (harmonics) 成分，它們的頻率是基頻 (fundamental frequency) 的整數倍 19

Spoken Language Structure Hsin-min Wang References: - X. Huang et - PowerPoint PPT Presentation

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2 Human Speech Communication Spoken language is used to communicate information from a speaker to a listener. Speech production and

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Exploring Measures of Readability for Spoken Language Introduction Analyzing linguistic

Some Open Challenges for Spoken Language Processing Lori Lamel CHIST-ERA Cork, September 6,

Language Modeling Hsin-min Wang References: 1. X. Huang et. al., Spoken Language Processing,

Spoken Language Biomarkers for Detecting Cognitive Impairment Tuka Alhanai Advisor: James Glass

Language Model Adaptation Hsin-min Wang References: X. Huang et. al., Spoken Language

So umm let's y'know talk about spoken language @CARDSatUU #UlsterLinguistics What does the

and Identity Construction Tess Renker What is Quechua? Indigenous language spoken mainly in

Investigating neural representations of spoken language Grzegorz Chrupaa In collaboration

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

PACDEFF 2012 Kitem presentum. Mwella Kith. That is a little bit of the language spoken by the

How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL Spoken Language Systems March 6,

Advances in Estonian Spoken Language Technology Tanel Alum ae Laboratory of Phonetics and

Hallucinating system outputs for discriminative language modeling Brian Roark Center for Spoken

Towards Continuous Qvality Control for Spoken Language Corpora Anne Ferger and Hanna Hedeland

Towards a Truly Statistical Natural Language Generator for Spoken Dialogues Ondej Duek

Spoken Dialogue Management & Natural Language Processing William Yang Wang Chia-che Tsai

The test is a placement test which is available for The language skills writing competences,

Marrying Up Regular Expressions with Neural Networks: A Case Study for Spoken Language

Natural Language Processing Lecture 27: Conclusion Levels of Linguistc nowledge spoken

11-737 Multilingual NLP Lang in 10: Hindi Example of 10 minute presentation on a language Hindi

Error Analysis Applied to End-to-End Spoken Language Understanding Introduction Context

Evaluation of Spoken Language Recognition Technology Using Broadcast Speech: Performance and

Spoken Language Structure Hsin-min Wang References: - X. Huang et - PowerPoint PPT Presentation

Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2 Human Speech Communication Spoken language is used to communicate information from a speaker to a listener. Speech production and

Spoken Language Structure Berlin Chen 2003 References: - X. Huang et. al., Spoken Language

Spoken Language Structure Berlin Chen 2004 References: - X. Huang et. al., Spoken Language

Exploring Measures of Readability for Spoken Language Introduction Analyzing linguistic

Some Open Challenges for Spoken Language Processing Lori Lamel CHIST-ERA Cork, September 6,

Language Modeling Hsin-min Wang References: 1. X. Huang et. al., Spoken Language Processing,

Spoken Language Biomarkers for Detecting Cognitive Impairment Tuka Alhanai Advisor: James Glass

Language Model Adaptation Hsin-min Wang References: X. Huang et. al., Spoken Language

So umm let's y'know talk about spoken language @CARDSatUU #UlsterLinguistics What does the

and Identity Construction Tess Renker What is Quechua? Indigenous language spoken mainly in

Investigating neural representations of spoken language Grzegorz Chrupaa In collaboration

Defining EBCL descriptors for Reception Spoken and Production Spoken Federica Casalin

PACDEFF 2012 Kitem presentum. Mwella Kith. That is a little bit of the language spoken by the

How to Wreck a Nice Beach Theory and Practice Paul Hsu CSAIL Spoken Language Systems March 6,

Advances in Estonian Spoken Language Technology Tanel Alum ae Laboratory of Phonetics and

Hallucinating system outputs for discriminative language modeling Brian Roark Center for Spoken

Towards Continuous Qvality Control for Spoken Language Corpora Anne Ferger and Hanna Hedeland

Towards a Truly Statistical Natural Language Generator for Spoken Dialogues Ondej Duek

Spoken Dialogue Management &amp; Natural Language Processing William Yang Wang Chia-che Tsai

The test is a placement test which is available for The language skills writing competences,

Marrying Up Regular Expressions with Neural Networks: A Case Study for Spoken Language

Natural Language Processing Lecture 27: Conclusion Levels of Linguistc nowledge spoken

11-737 Multilingual NLP Lang in 10: Hindi Example of 10 minute presentation on a language Hindi

Error Analysis Applied to End-to-End Spoken Language Understanding Introduction Context

Evaluation of Spoken Language Recognition Technology Using Broadcast Speech: Performance and

Spoken Dialogue Management & Natural Language Processing William Yang Wang Chia-che Tsai