Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 - - PowerPoint PPT Presentation

speech processing 11 492 18 495 speech processing 11 492
SMART_READER_LITE
LIVE PREVIEW

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 - - PowerPoint PPT Presentation

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID What is in the audio scene What is in the audio scene Searching for specific things Searching for specific things Cars, talking, music Cars, talking, music


  • Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID

  • What is in the audio scene What is in the audio scene  Searching for specific things Searching for specific things  Cars, talking, music Cars, talking, music  Generically analyzing the audio Generically analyzing the audio  Find the “important” parts Find the “important” parts

  • Items and Sequences Items and Sequences  Seq1 Seq1   Seq2 Seq2

  • Human judgment is non-standard Human judgment is non-standard  Context matters Context matters  Context includes a priori knowledge not Context includes a priori knowledge not represented in recording represented in recording  Two descriptions of a scene from a movie Two descriptions of a scene from a movie  “ “There were a series of beeps, and a bomb There were a series of beeps, and a bomb went off” went off”  “ “A timer counted down, and then there was a A timer counted down, and then there was a big boom” big boom” 4

  • A Hierarchical Structure for Sound A Hierarchical Structure for Sound Audio data Lower-level units Event sequence Event dependencies 5

  • Audio Unit Detection Audio Unit Detection  Low level acoustic units Low level acoustic units  Similar mcep over time Similar mcep over time  Find repeated segments over time Find repeated segments over time  Find repeated patterns over time Find repeated patterns over time