description of yourself your team lab your
play

Description of yourself, your team/lab, your topic area, and who - PowerPoint PPT Presentation

Description of yourself, your team/lab, your topic area, and who funds it Director of the Center for Language and Speech Processing more than 15 faculties in language, speech, machine translation, machine learning, cognitive sciences and


  1. Description of yourself, your team/lab, your topic area, and who funds it • Director of the Center for Language and Speech Processing – more than 15 faculties in language, speech, machine translation, machine learning, cognitive sciences and neurosciences – more than 40 graduate students – usual funding sources • Collaborations with CoE HLT • Three-student team working directly with me – acoustic processing for ASR • techniques based on temporal cues in the signal and on artificial neural net post-processing • biology-inspired auditory processing – funded by IARPA, DARPA and Google

  2. How does your area impact current speech technology (if at all) right now • Temporal features (perception of modulations) perception of modulations – longer (syllable and beyond) temporal context – RASTA, LDA filters, TRAPS, MRASTA, modulation spectrum, ….. • Data-guided features – LDA, convolutive DNNs,… • Parallel processing streams physiology of hearing – different frequency ranges, different spectro-temporal properties, different expertise (training), different degrees of prior constraints,.. • Hierarchical processing (deep learning?) – frequency-localized to full spectrum, short context to longer context, …

  3. Challenges • Human-like processing not always appreciated by hard-core engineers) • Communication between engineering and life sciences – different goals, different vocabularies, different reward systems,… • Researchers trained in both the life sciences and engineering are rare

  4. What Is The Problem? • ML (DNNs) – train over all sources of unwanted variability • How to deal with previously unseen data? • Knowledge from life sciences? – Emphasis on higher processing levels (beyond periphery) • Hierarchical processing in auditory system • Generalizations • Performance monitoring • Attention (what to ignore)

  5. Dealing with Unknown Unknowns: Biologically-inspired multi-stream processing of sensory information ~10 Hz ~1000 Hz 1. how to create processing streams ? processing 2. “smart” fusion ? periphery cortex ~100K neurons ~10M neurons How do we know which combination of processing streams yield “correct” information Preserving information in a system ? the information must “make sense” • prior knowledge learning Typical sound occurrences, typical confusions, and typical temporal signal information “smart” patterns of speech sounds fusion processing streams bottom-up streams environment conventional proposed best by hand • conflicts indicate localized corruptions clean 31 % 28 % 25 % bottom-up dominated leave out affected • modalities, projections car at 0 dB SNR 54 % 38 % 35 % streams within modalities “ five ” “ three ” “ zero ’ top-down and bottom-up streams /th/ /r/ /iy/ weak top-down influenced conflicts indicate unexpected • priors different strengths of inputs /z/ /z/ /r/ strong /r/ /f/ /iy/ /iy/ prior constraints /v/ • opportunity for learning /oh/ /oh/ /sil/ priors /ay/ divergence time

  6. TRAINING clean 10 dB SNR 5 dB SNR “clean” DNN signal DNN decoder signal “10 dB” DNN “5 dB” DNN word error rates Aurora 4 Training / Test Clean 10 dB SNR 5 dB SNR Clean 3.10 15.65 36.60 10 dB SNR 5.06 4.35 14.70 5 dB SNR 9.04 4.73 7.73 multi-condition training 4.28 5.17 11.86 multi-band 3.06 3.12 10.29

  7. Where Are We Now ? Signal processing, information theory, machine learning, … signal processing pattern classification decoder message

  8. And Where Are We Heading ? Repetition, fillers, hesitations, interruptions, unfinished and non-gramatical sentences, new words, dialects, emotions, … Current DARPA and IARPA programs, research agenda of the JHU CoE HLT, industrial efforts (Google, Microsoft, IBM, Amazon,…) neural information processing, Signal processing, & psychophysics, physiology, cognitive information theory, machine learning, … science, phonetics and linguistics, ... Engineering and Life Sciences together !

Recommend


More recommend