speech information processing
play

Speech Information Processing Akinori Ito Graduate School of - PowerPoint PPT Presentation

Sound Media Engineering part II Speech Information Processing Akinori Ito Graduate School of Engineering, Tohoku Univ. aito@fw.ipsj.or.jp 1 Overview of the lecture #1: Production and coding of speech (1) Speech production, feature of


  1. Sound Media Engineering part II Speech Information Processing Akinori Ito Graduate School of Engineering, Tohoku Univ. aito@fw.ipsj.or.jp 1

  2. Overview of the lecture ● #1: Production and coding of speech (1) – Speech production, feature of speech sound – Basic codecs: PCM,DPCM,ADPCM ● #2: Coding of speech (2) – Linear Prediction of speech: Linear Prediction Coefficients, PARCOR Coefficients and LSP – CELP coding – Audio coding ● #3: Speech enhancement – Spectral subtraction – Microphone array 2

  3. Production of speech ● Organs that produce speech – vocal cords – larynx – pharynx – tongue vocal – gums tract – teeth – lips – nasal cavity 3

  4. Acoustic tube model ● Human speech production is similar to wind instruments 鼻腔 喉頭 唇 声道 声帯 Linguistic content Pitch of voice Personality 4

  5. Linguistic and speaker feature 鼻腔 喉頭 唇 声道 声帯 A speaker can control shape of this part 5

  6. Linguistic and speaker feature 鼻腔 喉頭 唇 声道 声帯 A speaker cannot control shape of this part, total length of vocal tract 6

  7. Speech waveform ● Complex enough /a/ /i/ /u/ /e/ /o/ 7

  8. Speech waveform ● It is complex, but almost periodic Fundamental period Fundamental period T [s] Fundamental frequency F 0 [Hz] = 1/ T 8

  9. Various "a" ● Two /a/'s with different fundamental frequencies – Same phone = same vocal tract shape – Completely different waveforms – What is the same between these waveforms? 9

  10. Spectrum of speech ● Spectrum of two /a/'s – Spectral shapes are similar →Shape of vocal tract – "Jaggies" of speectrum differ→Fundamental Freq. 10

  11. Spectrum and formant frequencies ● F 0 : 基本周波数 ● F 1 ,F 2 ,..: ホルマント (formant) 周波数 Formant frequencies F 1 ホルマント周波数 F 0 F 2 F 3 F 4 11 Fundamental frequency 基本周波数

  12. Speech coding ● Sound (analog) → Convert to digital data – Handle with computer – Transmission over digital line ● How do we digitize sound? – Goals ● Good quality when converting back to analog sound ● Less bit-rate – Methodology ● Exploit various features of speech 12

  13. Basics of speech coding ● Sampling – Observe the temporally continuous signal at discrete time – Period of "discrete" observation: sampling frequency f s – The original signal can be restored from sampled data when the original signal only contains frequency component under f s /2 (Sampling Theorem) 13

  14. Basics of speech coding ● Quantization – Round off magnitude of signal into discrete level ● Magnitude of signal can be represented in integers – The discrete level : quantization step – Difference between the original signal and quantized signal : Quantization error 14

  15. Sampling and quantization: how are they determined? ● Sampling frequency is determined by the highest frequency in the sound – Telephone : 8kHz (up to 4kHz sound) – High-quality speech: 16kHz (up to 8kHz sound) – CD : 44.1kHz (up to 22.05kHz sound) ● Quantization is determined by the dynamic range of the sound – To code speech is to quantize speech 15

  16. PCM coding ● PCM(Pulse Code Modulation) – Represent the quantized values as binary numbers ● What to be determined in PCM – How many bits to be used for one sample – How to determine levels of quantization ● Equal steps: linear quantization ● Inequal steps: nonlinear quantization ● Examples of PCM coding – CD:16bit linear quantization – VoIP(G.711): 8bit nonlinear quantization 16

  17. PCM linear quantization ● There are nothing difficult 10 5 0 -5 -10 -7 -7 5 2 -6 -2 0 1 4 0 -2 11 11 0 -1 -2 0 3 2 0 1 33 CD: quantize in 16bit(-32768 ~ +32767) 17

  18. Nonlinear quantization ● Most samples are nearly zero →Total error can be reduced by finely quanti- zing values around zero 10 10 5 5 0 0 -5 -5 -10 -10 18

  19. Example of nonlinear quantization: G.711 ● Speech coding for 64kbit/s digital phone line – 8kHz sampling, 8bit nonlinear quantization – μ-Law (Japan, US) A-Law (Europe) – μ-Law: 14bit linear quant.→8bit nonlinear quant. 150 100 log  1  255 ∣ X ∣ 8192  50 8bit mu-Law 0 Y = 128 sign  X  log256 -50 -100 -150 19 -8000 -6000 -4000 -2000 0 2000 4000 6000 8000 14bit linear

  20. Differential PCM (DPCM) ● In ordinary speech signal, values of two contiguous samples do not differ very much →Reduce bit-rate by transmitting the differences of samples Q - z -1 20

  21. Differential PCM(DPCM) ● Original 15000 10000 5000 waveform 0 -5000 -10000 -15000 -20000 -25000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 ● Differential 15000 waveform 10000 5000 0 -5000 -10000 -15000 -20000 -25000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 21

  22. Adaptive Differential PCM(ADPCM) ● To enhance efficiency of DPCM – Use more sophisticated prediction rather than simple difference – Adaptively change quantization steps ● When difference between two samples is large, the difference to the next sample is likely to be large too ● When difference between two samples is small, the difference to the next sample is likely to be small too 22

  23. Block diagram of ADPCM differential signal d  k  I  k  + adaptive ADPCM PCM input + quantizer output x  k  - x e  k  + prediction signal adaptive signal + predictor de-quantizer + x r  k  d q  k  reconstructed quantized signal differential signal 23

  24. Calculation algorithm of ADPCM x e  k  1.Compute prediction signal 2.Compute difference d  k = x e  k − x  k  3.Quantize (ADPCM output) I  k = Q  d  k   4.De-quantize − 1  I  k   d q  k = Q 5.Reconstruct signal x r  k = x e  k  d q  k  x e  k  1 = pred  x r  k  ,d q  k  ,   6.Compute next prediction 24

  25. Prediction of speech signal ● ADPCM quantizes difference between the input signal and predicted signal ● How to predict signal – DPCM x e  k = x r  k − 1  – A little better way x e  k = 2 x r  k − 1 − x r  k − 2  – G.726 2 6 x e  k = ∑ a i x r  k − i  ∑ b i d q  k − i  i = 1 i = 1 25

  26. Determine quantization step adaptively (example) ● Observe difference between previous sample using the scale ● If the difference is 7 7 "blue", half the size 6 6 of the next scale 5 5 4 4 ● If the difference is 3 3 "red", double the 2 2 1 1 size of the next 0 0 - - 1 1 scale - - 2 2 - - 3 3 - - 4 4 - - 5 5 - - 6 6 - - 7 7 - - 8 8 26

  27. For high-efficiency speech coding ● PCM, DPCM, ADPCM encodes general sound signal – DPCM, ADPCM partly exploits property of input signal ● Human speech is a small part of sound signal →We can enhance efficiency of coding by considering property of human speech ● What is the property of human speech? 27

  28. High-level speech coding digital speech words/ phones semantics speech data feature sentences CELP coder under summarizing PCM coder (mobile phone) research telephone? (public phone) digital speech words/ phones semantics 音声 data feature sentences AD/DA vocoder speech Text-to- synthesis Speech 28

  29. Speech production model nasal 鼻腔 cavity radiation larynx 喉頭 lips 唇 X  vocal cords vocal tract 声道 声帯 T  R  S  X = S  T  R  29

  30. Speech production model S  30

  31. Speech production model T  R  S  31

  32. Modeling speech using parameters ● Modeling speech using linear prediction (LPC) – Spectral shape: parameters of linear prediction filter – Vocal cord vibration : residue Estimate p x  k =− ∑ coefficients to a i x  k − i  e  k  minimize residue i = 1 – In spectral domain E  X = = E  H  p 1  ∑ ni  a n e n = 1 S  T  R  32

  33. Analysis and transmission of speech by LPC ● Information to be transmitted – LP coefficients a i and residue e ( k ) ● How to transmit them? – Estimate a i for a fixed number of samples (a block) – Calculate e ( k ) using estimated a i – Transmit a i and e ( k ) as parameters of the block ● How to restore the signal? – Using LPC formula p x  k =− ∑ a i x  k − i  e  k  33 i = 1

  34. Estimation of LP coefficients ● How to estimate LPC from x (1)... x ( k ) – Solve a simultaneous equation (Yule-Walker equation ) → LPC are calculated as the least-error solution – Faster algorithm (Levinson-Durbin algorithm) ● LPC equation x  1    a p  −  =  x  p     e  p   a 1 x  k − 1  x  k − 2  ⋯ x  k − p  x  k  e  k  x  k − 2  x  k − 3  ⋯ x  k − p − 1  a 2 x  k − 1  e  k − 1  ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋮ x  p − 1  x  p − 2  ⋯ − FA = V  E 34

Recommend


More recommend