speech processing 11 492 18 495 speech processing 11 492
play

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 - PowerPoint PPT Presentation

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID What is in the audio scene What is in the audio scene Searching for specific things Searching for specific things Cars, talking, music Cars, talking, music


  1. Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID

  2. What is in the audio scene What is in the audio scene  Searching for specific things Searching for specific things  Cars, talking, music Cars, talking, music  Generically analyzing the audio Generically analyzing the audio  Find the “important” parts Find the “important” parts

  3. Items and Sequences Items and Sequences  Seq1 Seq1   Seq2 Seq2

  4. Human judgment is non-standard Human judgment is non-standard  Context matters Context matters  Context includes a priori knowledge not Context includes a priori knowledge not represented in recording represented in recording  Two descriptions of a scene from a movie Two descriptions of a scene from a movie  “ “There were a series of beeps, and a bomb There were a series of beeps, and a bomb went off” went off”  “ “A timer counted down, and then there was a A timer counted down, and then there was a big boom” big boom” 4

  5. A Hierarchical Structure for Sound A Hierarchical Structure for Sound Audio data Lower-level units Event sequence Event dependencies 5

  6. Audio Unit Detection Audio Unit Detection  Low level acoustic units Low level acoustic units  Similar mcep over time Similar mcep over time  Find repeated segments over time Find repeated segments over time  Find repeated patterns over time Find repeated patterns over time

Recommend


More recommend