Spoken Language Structure Hsin-min Wang References: - X. Huang et al., Spoken Language Processing, Chapter 2
Human Speech Communication � Spoken language is used to communicate information from a speaker to a listener. Speech production and perception ( 知覺 ) are both important components of the speech chains – Speech begins with a thought and intent to communicate in the brain, which activates muscular ( 肌肉的 ) movements to produce speech sounds – A listener receives it in the auditory system ( 聽覺系統 ), processing it for conversion to neurological signals ( 神經邏輯信 號 ) the brain can understand – The speaker continuously monitors and controls the vocal organs ( 發聲器官 ) by receiving his or her own speech as feedback 2
Components of Human Speech Communication Speech Generation Speech Understanding Application Semantics, Message Formulation Message Comprehension Actions M Phone, Word, Language System Language System Prosody W Feature Extraction Neuromuscular Mapping Neural Transduction Articulatory A Parameter Vocal Tract System Cochlea ( 耳蝸 ) Motion Sound Speech Analysis Speech Generation 3
Speech Generation � Message Formulation : creates the concept (message) to be expressed � Language System : converts the message into a sequence of words – the pronunciation of the words (i.e., the phoneme sequence) – the prosodic pattern: duration of each phoneme, intonation( 語調 ) of the sentence, and loudness of the sounds � Neuromuscular ( 神經肌肉 ) Mapping : perform articulatory ( 發聲的 ) mapping to control the vocal cords ( 聲帶 ), lips ( 唇 ), jaw ( 顎 ), tongue ( 舌 ) and velum ( 軟顎 ) to produce the sound sequence 4
Speech Generation - Explanations � 首先要整理自己的思想,決定要說的訊息內容 ( Message Formulation ) � 把它們變為適當的語言形式,選擇適當的詞彙,按照某種 語言的法則,組成詞句,以表達想說的訊息內容 ( 遣詞造 句 ) ( Language System ) � 以生理神經式衝動的形式,沿運動神經傳播到聲帶、舌唇 等器官的肌肉,驅動這些肌肉運動 ( Neuromuscular Mapping ) � 空氣發生壓力變化,經過聲腔的調節,從而產生出通常的 語言聲波 ( Vocal Tract System ) 5
Speech Understanding � Cochlea ( 耳蝸 ) Motion : the signal is passed to the cochlea in the inner ear, which performs the frequency analysis as a filter bank � Neural Transduction ( 神經傳導 ) : converts the spectral signal into activity signals on the auditory nerve, corresponding roughly to a feature extraction component It’s unclear how neural activity is mapped into the language system and how message comprehension ( 理 解 ) is achieved in the brain 6
From Sound to Phonetics and Phonology � Speech signals are composed of analog sound patterns that serve as the basis for a discrete, symbolic representation of the spoken language – phonemes ( 音素 ), syllables ( 音節 ) and words ( 詞 ) � The production and interpretation of these sounds are governed by the syntax ( 語法 ) and semantics ( 語意 ) of the language spoken � We will take a button-up approach to introduce the basic concepts from sound to phonetics ( 語音學 ) and phonology ( 音韻學 ) – Syllables and words are followed by syntax and semantics, which form the structure of spoken language processing � Contents of this part: – Sound and Human speech systems (Speech Production and Perception) – Phonetics and Phonology – Characteristics of the Chinese Language 7
Sound � Sound is a longitudinal ( 縱向的 ) pressure wave formed of compressions ( 壓縮 ) and rarefactions ( 稀疏 ) of air molecules ( 微粒 ), in a direction parallel to that of the application of energy � Compressions are zones where air molecules have been forced by the application of energy into a tighter-than- usual configuration � Rarefactions are zones where air molecules are less tightly packed 8
Sound (cont.) � The alternating configurations of compression and rarefaction of air molecules along the path of an energy source are sometimes described by the graph of a sine wave � The use of the sine graph is only a notational convenience for charting local pressure variations over time maximal rarefaction maximal compression crest trough 9
Measures of Sound � Amplitude is related to the degree of displacement of the molecules from their resting position – Measured on a logarithmic scale in decibels (dB, 分貝 ) – A decibel scale is a means for comparing the intensity ( 強度 ) of two sounds: ( ) 10 log I / I . I , I are two intensity levels 10 0 0 – The intensity is proportional to the square of the sound pressure P. The Sound Pressure Level (SPL) is a measure of the absolute sound pressure P in dB ( ) ( ) = SPL dB 20 log P / P 10 0 – The reference 0 dB corresponds to the threshold of hearing, which is P 0 =0.0002 μ bar for a tone of 1KHz • e.g. speech conversation level at 3 feet is about 60dB SPL, a jackhammer’s level is about 120 db SPL 手提鑿岩機 10
Measures of Sound (cont.) � Absolute threshold of hearing: the maximum amount of energy of a pure tone that cannot be detected by a listener in a noise free environment * 音爆 * 砲口 11
Speech Production – Articulation � Speech is produced by air-pressure waves emanating ( 發出 ) from the mouth and the nostrils ( 鼻孔 ) of a speaker � The inventory ( 清單 ) of phonemes ( 音素 ), the basic units of speech, can be split into two classes – Consonants ( 子音 ): articulated ( 發音 ) in the presence of constrictions ( 壓縮 ) in the throat or obstructions ( 阻礙 ) in the mouth (tongue, teeth, lips) as we speak – Vowels ( 母音 / 元音 ): articulated without major constrictions and obstructions 12
Speech Production – Articulation (cont.) � Lungs ( 肺 ): source of air during speech � Vocal cords ( larynx , 喉頭 ): when the vocal folds ( 聲帶 ) are held close together and oscillate one another during a speech sound, the sound is voiced, when the folds are too slack or tense to vibrate periodically, the sound is unvoiced. � Soft palate ( velum , 軟顎 ):allow passage of air through the nasal cavity (m,n) � Hard palate ( 硬顎 ): tongue placed on it to produce certain consonants � Teeth : braces ( 支撐 ) the tongue for certain consonants � Lips : round or spread to affect vowel quality, closed completely to stop the Rounded vowels: / u / oral air flow for certain consonants Spread -> / i / ( p,b,m ) 13
Speech Production - The Voicing Mechanism � Voiced sounds – Including vowels, have a roughly regular pattern in both time and frequency structures that voiceless sounds lack – Have more energy – When the vocal folds vibrate during phoneme articulation, the phoneme is considered voiced; otherwise it is unvoiced • Vocal folds’ vibration (60Hz (man) ~ 300 Hz (woman or child)) – The distinct vowel timbres (of a sound of an instrument, 音質 / 音 色 ) are created by using the tongue and lips to shape the main oral resonance cavity in different ways 14
Speech Production - The Voicing Mechanism (cont.) � Voiced sounds (cont.) – The rate of cycling (opening and closing) of the vocal folds in the larynx during phonation of voiced sounds is called the fundamental frequency ( 基頻 ) • The fundamental frequency contributes more than any other single factor to the perception of pitch in speech • Use in tone recognition of tonal languages (e.g. Chinese) or as a measure of speaker identity or authenticity Fundamental frequency ~120Hz (1/8ms) 15
Speech Production - Pitch 細微的 心理聲學家 16
Speech Production - Spectrogram Spectral analysis at a single time-point A short-term frequency analysis The darkness or lightness of a band indicates the relative amplitude or energy present at a given frequency The dark horizontal bands show the formants, which are the fundamental at natural resonances of the vocal tract cavity position 17
Speech Production - Formants � The resonances ( 共振 / 共鳴 ) of the cavities that are typical of particular articulator configurations (e.g. the different vowel timbres) are called formats ( 共振峰 ) 18
Explanations for Speech Production 人的發音器官可分三大部分 � 動力器官:呼吸系統及肌肉 – 呼吸運動肌肉良好的協調運動將胸腔中的空氣以穩定的壓力推出 通過喉部,形成了發聲時的動力來源。 � 發聲器官:喉部的聲帶 – 喉部的聲帶經由喉內肌及周圍的其他肌肉共同的作用,形成特定 的聲門組態。當動力源的氣流通過聲門時,帶動了柔軟的聲帶黏 膜產生波動。根據此時的聲門組態,聲帶黏膜會產生特定頻率及 形態的波動,使得通過的氣流受到規律的阻隔,產生了空氣的疏 密波。 – 聲帶是發聲體本身,為語音提供主要的聲源。聲帶振動產生的一 系列的脈衝 (impulses) ,是一種週期波,其頻譜含有大量的諧波 (harmonics) 成分,它們的頻率是基頻 (fundamental frequency) 的 整數倍 19
Recommend
More recommend