ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012

Leuven KU Leuven introduction 2

Leuven - university • Associatie K.U.Leuven – University + 12 university colleges – 85000 students, 600 programs • KU Leuven – 35000 students 350 programs • Faculty of Engineering – 2000 Students (Ba+Ma), 60 programs • Department of Electrical Engineering (ESAT) - 150 Ma students, 6 programs 270 PhD students and postdocs … - - 35 FTE permanent staff Centre for Processing of Speech and Images (PSI) - 37 PhD students and postdocs - 8 FTE permanent staff - Speech research group - 12 PhD students and 1.4 postdocs - 5 Master students - 2.3 FTE permanent staff - Patrick Wambacq - Dirk Van Compernolle - Hugo Van hamme KU Leuven introduction 3

ESAT/PSI-Speech research areas • noise robustness – speech enhancement – source separation – source localization • new paradigms for speech recognition – episodic models • build and consolidate digital infrastructure for the Dutch language • speaker properties (text-independent): ID, language, dialect, age, height • acoustic environment modeling – ADL recognition • zero-resource ASR - language acquisition by machines • speech assessment - education

Speech assessment • Reading tutor (dyslexia) / trainer after CI fitting • Assess native (?) pronunciation, reading/respeak tracking • Children’s speech, hesitant, poorly articulated

Zero-resource speech recognition Why ? • Assistive technologies: – people with limited fine motor control – alternative to scanning – cope with dysarthric voices • Huge inter-speaker variation • Timing, extraneous sounds • Dialects • Long-term: interacting with robots – “Fetch a Hoegaarden Grand Cru from the fridge” – “Get my red slippers” – “Open the garden window for me”

What’s different ? • Learn acoustic model and language model from examples with noisy, high-level supervision information – Not like traditional ASR – Not like the zero-resource challenge (IS15) • Our first steps: – Home automation – “open the kitchen door”, “kitchen door open” – Learn from demonstrations = weak supervision • Learn acoustic model, vocabulary and grammar (ASR) • Learn mapping to semantic frames (NLU)

VIVOCA results

Work ahead • Larger vocabularies – How does a word spurt come about ? • Faster learning – Ideally from one example • More complex instructions and semantic representations – Continuous state space – Dynamic representation of semantics – Uncertainty in meaning – ... – Related to many actual research topics in robotics

What’s needed ? Speech assessment • Investment in non-native and regional accent data • Getting government involved is hard (budget cuts etc.) Zero-resource ASR • Interaction data – grow complexity of task – Limited reuse from one task to the next • Understanding by community of relevance of the problem – Cfr. reviewer instructions for IS15 Zero-resource Challenge •  Investment attitude in Europe/Belgium •  Industrial interest is growing internationally

Questions ?

ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012 - PowerPoint PPT Presentation

ESAT/PSI-Speech Hugo Van hamme NSF workshop May 2015 4 July 2012 Leuven KU Leuven introduction 2 Leuven - university Associatie K.U.Leuven University + 12 university colleges 85000 students, 600 programs KU Leuven

The Bunch Arrival Time Monitor (BAM) at PSI PSI, PSI, June 10, 2013 PSI, June 10, 2013 PSI,

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

PSI Muon Experiment at the PSI , KEK RCNP

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Organisation Identifier Project PSI RFI Response Shauna Pitts, on behalf of Andrew Pitts PSI

Omega Psi Phi Fraternity, Inc. Eta Delta Delta Chapter The History of Omega Psi Phi Omega

K.U.Leuven ESAT/SCD/COSIC Computer Security and Industrial Cryptography Danny De Cock

Age and Gender Recognition from Speech Patterns Based on Supervised NonNegative Matrix

OPEN SOURCE HUGO TESO ALLOW ME TO... CYBER-CYBER AIRPLANES! NDA... :( I DO OPEN SAUCE!

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Wednesday - Dydd

Cada Da - Welsh Meeting Template Social Language Learning Program - Template - Thursday - Dydd

CLOSER 2019, May., 2-4, Heraklion, Greece 1 CLOSER 2019, May., 2-4, Heraklion, Greece 2 Cloud

Your Future Why become a Registered Dietitian, Registered Dietitian Nutritionist? What are

Consuming videos with the ForkBrowser Consuming videos with the ForkBrowser Ork de Rooij, Cees

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

Building a LAN to Support Multiple Lightpath Projects Ronald van der Pol <rvdp@sara.nl>

H0K03a : Advanced Process Control Model-based Predictive Control 3 : Stability Bert Pluymers Prof.

Sambuz

Useful Links

Newsletter

Mail Us