Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID
What is in the audio scene What is in the audio scene Searching for specific things Searching for specific things Cars, talking, music Cars, talking, music Generically analyzing the audio Generically analyzing the audio Find the “important” parts Find the “important” parts
Items and Sequences Items and Sequences Seq1 Seq1 Seq2 Seq2
Human judgment is non-standard Human judgment is non-standard Context matters Context matters Context includes a priori knowledge not Context includes a priori knowledge not represented in recording represented in recording Two descriptions of a scene from a movie Two descriptions of a scene from a movie “ “There were a series of beeps, and a bomb There were a series of beeps, and a bomb went off” went off” “ “A timer counted down, and then there was a A timer counted down, and then there was a big boom” big boom” 4
A Hierarchical Structure for Sound A Hierarchical Structure for Sound Audio data Lower-level units Event sequence Event dependencies 5
Audio Unit Detection Audio Unit Detection Low level acoustic units Low level acoustic units Similar mcep over time Similar mcep over time Find repeated segments over time Find repeated segments over time Find repeated patterns over time Find repeated patterns over time
Recommend
More recommend