topic spectrogram cepstrum and mel frequency analysis
play

Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis Kishore - PowerPoint PPT Presentation

Speech Technology: A Practical Introduction Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis Kishore Prahallad Email: skishore@cs.cmu.edu Carnegie Mellon University & International Institute of Information Technology Hyderabad 1


  1. Speech Technology: A Practical Introduction Topic: Spectrogram, Cepstrum and Mel-Frequency Analysis Kishore Prahallad Email: skishore@cs.cmu.edu Carnegie Mellon University & International Institute of Information Technology Hyderabad 1 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  2. Topics • Spectrogram • Cepstrum • Mel-Frequency Analysis • Mel-Frequency Cepstral Coefficients 2 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  3. Spectrogram 3 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  4. Speech signal represented as a sequence of spectral vectors FFT FFT FFT Spectrum 4 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  5. Speech signal represented as a sequence of spectral vectors FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT Spectrum 5 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  6. Speech signal represented as a sequence of spectral vectors FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT Spectrum Amp. Hz 6 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  7. Speech signal represented as a sequence of spectral vectors FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT Spectrum Rotate it by 90 degrees Hz 7 Amplitude Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  8. Speech signal represented as a sequence of spectral vectors FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT Spectrum • MAP spectral amplitude to a grey level (0- Hz 255) value. 0 represents black and 255 represents white. • Higher the amplitude, darker the corresponding region. 8 Amplitude Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  9. Speech signal represented as a sequence of spectral vectors FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT Spectrum Hz Time 9 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  10. Speech signal represented as a sequence of spectral vectors Time Vs Frequency FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT FFT representation of a speech signal is referred to as spectrogram Spectrum Hz Time 10 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  11. Some Real Spectrograms Dark regions indicate peaks (formants) in the spectrum 11 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  12. Why we are bothered about spectrograms Phones and their properties are better observed in spectrogram 12 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  13. Why we are bothered about spectrograms Sounds can be identified much better by the Formants and by their transitions 13 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  14. Why we are bothered about spectrograms Sounds can be identified much better by the Formants and by their transitions Hidden Markov Models implicitly model these spectrograms to perform speech recognition 14 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  15. Usefulness of Spectrogram • Time-Frequency representation of the speech signal • Spectrogram is a tool to study speech sounds (phones) • Phones and their properties are visually studied by phoneticians • Hidden Markov Models implicitly model spectrograms for speech to text systems • Useful for evaluation of text to speech systems – A high quality text to speech system should produce synthesized speech whose spectrograms should nearly match with the natural sentences. 15 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  16. Cepstral Analysis 16 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  17. A Sample Speech Spectrum dB Frequency (Hz) • Peaks denote dominant frequency components in the speech signal • Peaks are referred to as formants • Formants carry the identity of the sound 17 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  18. What we want to Extract? – Spectral Envelope • Formants and a smooth curve connecting them • This Smooth curve is referred to as spectral envelope dB Frequency (Hz) 18 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  19. Spectral Envelope Spectrum Spectral Envelope Spectral details 19 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  20. Spectral Envelope Spectrum log X[k] Spectral log H[k] Envelope Spectral log E[k] details 20 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  21. Spectral Envelope Spectrum log X[k] log X[k] = log H[k] + log E[k] 1. Our goal: We want to Spectral separate spectral log H[k] Envelope envelope and spectral details from the spectrum. Spectral log E[k] 2. i.e Given log X[k], details obtain log H[k] and log E[k], such that log X[k] = log H[k] + log E[k] 21 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  22. How to achieve this separation ? 22 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  23. Play a Mathematical Trick Spectrum • Trick: Take FFT of the spectrum!! • An FFT on spectrum referred to as Inverse Spectral FFT (IFFT). Envelope • Note: We are dealing with spectrum in log domain (part of the trick) • IFFT of log spectrum would represent the signal in pseudo- Spectral frequency axis details 23 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  24. Play a Mathematical Trick Spectrum Spectral Envelope Spectral A pseudo-frequency details axis 24 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  25. Play a Mathematical Trick Spectrum Low Freq. High Freq. region region Spectral Envelope Spectral A pseudo-frequency details axis 25 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  26. Play a Mathematical Trick Spectrum Low Freq. High Freq. region region Spectral Envelope IFFT Spectral A pseudo-frequency details axis 26 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  27. Play a Mathematical Trick Spectrum Low Freq. High Freq. Treat this as a region region sine wave with 4 cycles per sec. Spectral Envelope IFFT Spectral A pseudo-frequency details axis 27 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  28. Gives a peak Play a Mathematical Trick at 4 Hz in frequency Spectrum axis Low Freq. High Freq. Treat this as a region region sine wave with 4 cycles per sec. Spectral Envelope IFFT Spectral A pseudo-frequency details axis 28 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  29. Gives a peak Play a Mathematical Trick at 4 Hz in frequency Spectrum axis Low Freq. High Freq. Treat this as a region region sine wave with 4 cycles per sec. Spectral Envelope IFFT Spectral A pseudo-frequency details axis 29 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  30. Play a Mathematical Trick Spectrum Low Freq. High Freq. region region Spectral Envelope IFFT Spectral A pseudo-frequency details axis 30 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  31. Play a Mathematical Trick Gives a peak Spectrum at 100 Hz in frequency Low Freq. High Freq. axis region region Treat this as a Spectral sine wave with Envelope 100 cycles per sec. IFFT Spectral A pseudo-frequency details axis 31 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  32. Play a Mathematical Trick Spectrum Low Freq. High Freq. region region Spectral Envelope IFFT IFFT Spectral A pseudo-frequency details axis 32 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  33. Play a Mathematical Trick Spectrum Spectral Envelope Spectral A pseudo-frequency details axis 33 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  34. Play a Mathematical Trick log X[k] = log H[k] + log E[k] Spectrum IFFT Spectral log H[k] Envelope log E[k] Spectral A pseudo-frequency details axis 34 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  35. Play a Mathematical Trick x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum IFFT Spectral log H[k] Envelope log E[k] Spectral A pseudo-frequency details axis 35 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  36. Play a Mathematical Trick x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum IFFT Spectral log H[k] Envelope In practice all you have access to only log E[k] log X[k] and hence you can obtain x[k] Spectral A pseudo-frequency details axis 36 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  37. Play a Mathematical Trick x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum IFFT Spectral log H[k] Envelope If you know x[k] Filter the low log E[k] frequency region to get h[k] Spectral A pseudo-frequency details axis 37 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

  38. Play a Mathematical Trick x[k] = h[k] + e[k] log X[k] = log H[k] + log E[k] Spectrum IFFT Spectral A pseudo-frequency log H[k] Envelope axis x[k] is referred to as Cepstrum • h[k] is obtained by considering log E[k] • the low frequency region of x[k]. h[k] represents the spectral • envelope and is widely used as feature for speech recognition Spectral details 38 Speech Technology - Kishore Prahallad (skishore@cs.cmu.edu)

Recommend


More recommend