josh mcdermott dept of brain and cognitive sciences mit
play

Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, - PowerPoint PPT Presentation

Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, 2015 NSF Speech Technology Workshop My research group: Laboratory for Computational Audition Psychology Neuroscience Engineering Experiments Auditory Machine in humans


  1. Josh McDermott Dept. of Brain and Cognitive Sciences, MIT May 6, 2015 NSF Speech Technology Workshop

  2. My research group: Laboratory for Computational Audition Psychology Neuroscience Engineering Experiments Auditory Machine in humans neuroscience algorithms • We study auditory scene analysis and sound recognition • Contact with speech technology through assistive devices and machine intelligence • Funded by McDonnell Foundation and NSF

  3. Recent approach in our lab: train deep convolutional neural networks on speech tasks, compare representations to brain • So far: word recognition, speaker identification in noise • CNN performs about as well as humans • Can use CNN as a hypothesis about neural representation

  4. Ability of shallow vs. deep CNN layers to predict brain responses provides insights into computational complexity: Primary auditory cortex Speech- selective cortex shallow  deep CNN layer

  5. Using speech analysis/synthesis to manipulate grouping cues: • STRAIGHT decomposes speech into excitation and filtering. • Excitation modeled sinusoidally • Altered to inharmonic, or replaced with noise to simulate whispering: • Do these manipulations affect ability to segregate speech? joint work with Kawahara & Ellis

  6. “WORD 1” Task: “WORD” or Type in all the words you hear. + “WORD 2” 0.9 • Single word recognition 0.8 similar for all conditions. 0.7 • For word pairs, recognition worse for 0.6 Mean # Correct Words inharmonic than 0.5 harmonic speech, suggestive of effect on 0.4 segregation. 0.3 • But much larger effect 0.2 of whispering. Harmonic • Potentially suggestive of 0.1 Jittered Whispered importance of sparsity. 0 Single Word Word Pairs

  7. Reverberation profoundly distorts sound signals: Dry Reverberant Problem for machine speech recognition: Percent Errors Reverberation is also a challenge for hearing- impaired listeners.

  8. Characterizing the distribution of real-world reverberation What is the empirical distribution of environmental impulse responses? IR Measurement • Broadcast fixed source signal • Record resulting reverberant signal • From this, infer environmental IR IR Survey • 24 text messages/day • Phone returns GPS coordinates • Participants reply to text with photo, address

  9. Everyday impulse responses are pretty stereotyped Frequency asymmetry (skew of subband RT60) 6 Survey • Exponential decay KEMAR HATS 5 8m • Faster at high frequencies • Exaggerated asymmetry in 4 271 IRs from 301 large rooms surveyed locations 3 • Suggests prior for dereverberation … 2 1 0 1st quartile 4th quartile -1 -2 -1 0 10 10 10 Mean subband RT60 (s)

  10. Challenges to Impacting Technology • Lack of large high-quality labeled data sets in some domains • Emotional speech • Environmental sounds • Cultural divides between neuroscience and engineering • Different meetings, departments, jargon, funders • Possibly getting worse? • Workshops help, particularly if students have access

Recommend


More recommend