speech processing 15 492 18 492
play

Speech Processing 15-492/18-492 Speech Recognition Signal - PowerPoint PPT Presentation

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech (sound) is analog Speech (sound) is analog Computers are digital Computers are digital We need to convert We need to convert


  1. Speech Processing 15-492/18-492 Speech Recognition Signal Processing

  2. Analog to Digital Speech (sound) is analog � Speech (sound) is analog � � Computers are digital Computers are digital �  We need to convert We need to convert  Sample from A- -D converter D converter � Sample from A � � N times a second N times a second � How many times a second? � How many times a second? �

  3. Goals of Signal Processing Distinguish between phonetic types � Distinguish between phonetic types � Be invariant to channel/room conditions � Be invariant to channel/room conditions � Be invariant to speaker characteristics � Be invariant to speaker characteristics � Computational efficiency � Computational efficiency �

  4. Time vs Frequency Domain Human ear distinguishes frequencies � Human ear distinguishes frequencies � Initial ASR used time domain features � Initial ASR used time domain features � � Power Power � � Zero crossings (sort of frequency) Zero crossings (sort of frequency) �

  5. Source Filter Model Pitch Voiced Pulse Filter Noise Vocal Track Unvoiced Model

  6. Time domain Signal

  7. Waveform Representation

  8. Speech Spectragram

  9. /iy/ vs /ae/ • “beat” /b iy t/ and “bat” /b ae t/

  10. Frequency Domain • “pencils” /p eh n s ih l z/

  11. Frequency Domain • “beats pits” / b iy t s p ih t s /

  12. Speech Analysis

  13. Standard Parameterization Split waveform into “frames” � Split waveform into “frames” � � Advance every 10ms Advance every 10ms � � Size around 25ms (overlapping frames) Size around 25ms (overlapping frames) � � Window them Window them � � Perform FFT/Mel Perform FFT/Mel Cepstral Cepstral analysis analysis � � Find Deltas (difference from previous) Find Deltas (difference from previous) � � Find Delta Deltas (difference in delta) Find Delta Deltas (difference in delta) �

  14. Summary Time domain vs vs Frequency domain Frequency domain � Time domain � Parameterization of speech � Parameterization of speech � � Frequency domain Frequency domain � � Short term Short term FFTs FFTs � � FFT FFT vs vs MEL MEL Cepstrum Cepstrum �

Recommend


More recommend