Sound Media Engineering part II Speech Information Processing Akinori Ito Graduate School of Engineering, Tohoku Univ. aito@fw.ipsj.or.jp 1
Overview of the lecture ● #1: Production and coding of speech (1) – Speech production, feature of speech sound – Basic codecs: PCM,DPCM,ADPCM ● #2: Coding of speech (2) – Linear Prediction of speech: Linear Prediction Coefficients, PARCOR Coefficients and LSP – CELP coding – Audio coding ● #3: Speech enhancement – Spectral subtraction – Microphone array 2
Production of speech ● Organs that produce speech – vocal cords – larynx – pharynx – tongue vocal – gums tract – teeth – lips – nasal cavity 3
Acoustic tube model ● Human speech production is similar to wind instruments 鼻腔 喉頭 唇 声道 声帯 Linguistic content Pitch of voice Personality 4
Linguistic and speaker feature 鼻腔 喉頭 唇 声道 声帯 A speaker can control shape of this part 5
Linguistic and speaker feature 鼻腔 喉頭 唇 声道 声帯 A speaker cannot control shape of this part, total length of vocal tract 6
Speech waveform ● Complex enough /a/ /i/ /u/ /e/ /o/ 7
Speech waveform ● It is complex, but almost periodic Fundamental period Fundamental period T [s] Fundamental frequency F 0 [Hz] = 1/ T 8
Various "a" ● Two /a/'s with different fundamental frequencies – Same phone = same vocal tract shape – Completely different waveforms – What is the same between these waveforms? 9
Spectrum of speech ● Spectrum of two /a/'s – Spectral shapes are similar →Shape of vocal tract – "Jaggies" of speectrum differ→Fundamental Freq. 10
Spectrum and formant frequencies ● F 0 : 基本周波数 ● F 1 ,F 2 ,..: ホルマント (formant) 周波数 Formant frequencies F 1 ホルマント周波数 F 0 F 2 F 3 F 4 11 Fundamental frequency 基本周波数
Speech coding ● Sound (analog) → Convert to digital data – Handle with computer – Transmission over digital line ● How do we digitize sound? – Goals ● Good quality when converting back to analog sound ● Less bit-rate – Methodology ● Exploit various features of speech 12
Basics of speech coding ● Sampling – Observe the temporally continuous signal at discrete time – Period of "discrete" observation: sampling frequency f s – The original signal can be restored from sampled data when the original signal only contains frequency component under f s /2 (Sampling Theorem) 13
Basics of speech coding ● Quantization – Round off magnitude of signal into discrete level ● Magnitude of signal can be represented in integers – The discrete level : quantization step – Difference between the original signal and quantized signal : Quantization error 14
Sampling and quantization: how are they determined? ● Sampling frequency is determined by the highest frequency in the sound – Telephone : 8kHz (up to 4kHz sound) – High-quality speech: 16kHz (up to 8kHz sound) – CD : 44.1kHz (up to 22.05kHz sound) ● Quantization is determined by the dynamic range of the sound – To code speech is to quantize speech 15
PCM coding ● PCM(Pulse Code Modulation) – Represent the quantized values as binary numbers ● What to be determined in PCM – How many bits to be used for one sample – How to determine levels of quantization ● Equal steps: linear quantization ● Inequal steps: nonlinear quantization ● Examples of PCM coding – CD:16bit linear quantization – VoIP(G.711): 8bit nonlinear quantization 16
PCM linear quantization ● There are nothing difficult 10 5 0 -5 -10 -7 -7 5 2 -6 -2 0 1 4 0 -2 11 11 0 -1 -2 0 3 2 0 1 33 CD: quantize in 16bit(-32768 ~ +32767) 17
Nonlinear quantization ● Most samples are nearly zero →Total error can be reduced by finely quanti- zing values around zero 10 10 5 5 0 0 -5 -5 -10 -10 18
Example of nonlinear quantization: G.711 ● Speech coding for 64kbit/s digital phone line – 8kHz sampling, 8bit nonlinear quantization – μ-Law (Japan, US) A-Law (Europe) – μ-Law: 14bit linear quant.→8bit nonlinear quant. 150 100 log 1 255 ∣ X ∣ 8192 50 8bit mu-Law 0 Y = 128 sign X log256 -50 -100 -150 19 -8000 -6000 -4000 -2000 0 2000 4000 6000 8000 14bit linear
Differential PCM (DPCM) ● In ordinary speech signal, values of two contiguous samples do not differ very much →Reduce bit-rate by transmitting the differences of samples Q - z -1 20
Differential PCM(DPCM) ● Original 15000 10000 5000 waveform 0 -5000 -10000 -15000 -20000 -25000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 ● Differential 15000 waveform 10000 5000 0 -5000 -10000 -15000 -20000 -25000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 21
Adaptive Differential PCM(ADPCM) ● To enhance efficiency of DPCM – Use more sophisticated prediction rather than simple difference – Adaptively change quantization steps ● When difference between two samples is large, the difference to the next sample is likely to be large too ● When difference between two samples is small, the difference to the next sample is likely to be small too 22
Block diagram of ADPCM differential signal d k I k + adaptive ADPCM PCM input + quantizer output x k - x e k + prediction signal adaptive signal + predictor de-quantizer + x r k d q k reconstructed quantized signal differential signal 23
Calculation algorithm of ADPCM x e k 1.Compute prediction signal 2.Compute difference d k = x e k − x k 3.Quantize (ADPCM output) I k = Q d k 4.De-quantize − 1 I k d q k = Q 5.Reconstruct signal x r k = x e k d q k x e k 1 = pred x r k ,d q k , 6.Compute next prediction 24
Prediction of speech signal ● ADPCM quantizes difference between the input signal and predicted signal ● How to predict signal – DPCM x e k = x r k − 1 – A little better way x e k = 2 x r k − 1 − x r k − 2 – G.726 2 6 x e k = ∑ a i x r k − i ∑ b i d q k − i i = 1 i = 1 25
Determine quantization step adaptively (example) ● Observe difference between previous sample using the scale ● If the difference is 7 7 "blue", half the size 6 6 of the next scale 5 5 4 4 ● If the difference is 3 3 "red", double the 2 2 1 1 size of the next 0 0 - - 1 1 scale - - 2 2 - - 3 3 - - 4 4 - - 5 5 - - 6 6 - - 7 7 - - 8 8 26
For high-efficiency speech coding ● PCM, DPCM, ADPCM encodes general sound signal – DPCM, ADPCM partly exploits property of input signal ● Human speech is a small part of sound signal →We can enhance efficiency of coding by considering property of human speech ● What is the property of human speech? 27
High-level speech coding digital speech words/ phones semantics speech data feature sentences CELP coder under summarizing PCM coder (mobile phone) research telephone? (public phone) digital speech words/ phones semantics 音声 data feature sentences AD/DA vocoder speech Text-to- synthesis Speech 28
Speech production model nasal 鼻腔 cavity radiation larynx 喉頭 lips 唇 X vocal cords vocal tract 声道 声帯 T R S X = S T R 29
Speech production model S 30
Speech production model T R S 31
Modeling speech using parameters ● Modeling speech using linear prediction (LPC) – Spectral shape: parameters of linear prediction filter – Vocal cord vibration : residue Estimate p x k =− ∑ coefficients to a i x k − i e k minimize residue i = 1 – In spectral domain E X = = E H p 1 ∑ ni a n e n = 1 S T R 32
Analysis and transmission of speech by LPC ● Information to be transmitted – LP coefficients a i and residue e ( k ) ● How to transmit them? – Estimate a i for a fixed number of samples (a block) – Calculate e ( k ) using estimated a i – Transmit a i and e ( k ) as parameters of the block ● How to restore the signal? – Using LPC formula p x k =− ∑ a i x k − i e k 33 i = 1
Estimation of LP coefficients ● How to estimate LPC from x (1)... x ( k ) – Solve a simultaneous equation (Yule-Walker equation ) → LPC are calculated as the least-error solution – Faster algorithm (Levinson-Durbin algorithm) ● LPC equation x 1 a p − = x p e p a 1 x k − 1 x k − 2 ⋯ x k − p x k e k x k − 2 x k − 3 ⋯ x k − p − 1 a 2 x k − 1 e k − 1 ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋮ x p − 1 x p − 2 ⋯ − FA = V E 34
Recommend
More recommend