Automatic speech recognition and keyword spotting in under-resourced - PowerPoint PPT Presentation

May 10, 2023 •36 likes •246 views

Automatic speech recognition and keyword spotting in under-resourced languages Digital Signal Processing Group, E&E Engineering 21 February 2020 DSP group DSP group: More than speech http://dsp.sun.ac.za/~trn Communication network for

Automatic speech recognition and keyword spotting in under-resourced languages Digital Signal Processing Group, E&E Engineering 21 February 2020
DSP group
DSP group: More than speech http://dsp.sun.ac.za/~trn ● Communication network for wildlife sensors ● Optimised kinetic energy harvesting ● Automatic detection and classification of coughing in audio ● Virtual reality visualisation and analysis of microscopy data ● Sensor network for viticulture ● Interactive document visualisation for the blind
Automatic Language Processing: Then
Automatic Language Processing: Now
Language usage in South Africa
Multilingual corpus of code-switched South African speech
English – isiZulu CS speech
UN project
Target Languages Speech data • Ugandan English (6h), Luganda (9h), Acholi (9h, 12min) • Somali (1.6 h) • UE was augmented with SAE data (20h) Text data • 109 million SAE words • 1 million Luganda words (online newspaper) • Transcriptions of the audio data Pronunciation rules : Phonetic experts
ASR-free CNN-DTW keyword spotting
Acoustic modelling Acoustic models: data perturbation • Convolutional Neural Networks (CNNs) • Time-Delay Neural Networks (TDNNs) • Bi-directional Long Short-Term Memory NN (BLSTMs) Language models: data augmentation • Recurrent Neural Networks (RNNs) • Long Short-Term Memory Neural Networks (LSTMs)
Somali speech recognition Multi-pass semi-supervised training
ASR-free CNN-DTW keyword spotting
ASR-free CNN-DTW keyword spotting Aim: • Rapid deployment of keyword spotting systems in new languages Idea: • Use Dynamic Time Warping (DTW) as supervision to train Convolutional Neural Networks (CNNs) using small set of isolated keywords • Recordings of keywords are used as exemplars in DTW template matching, apply to untranscribed speech • Use DTW scores as targets to train CNN on same unlabelled data • Very little labelled data is required but large amount of unlabelled data can be leveraged
Features for ASR-free keyword spotting • Query-by- example: search “string” provided as audio • Use Dynamic Time Warping to match query with utterances in search collection • Various feature representations investigated, e.g. Multilingual bottleneck features (2 & 10 languages) • Stacked autoencoder • Correspondence autoencoder • Combinations of these •
Results • Multilingual feature extraction combined with target language fine- tuning can be complimentary • CCN keyword spotting does not match DTW-based system • BUT outperforms CNN classifier trained only on keywords • Main advantage of CNN: orders of magnitude faster at runtime than DTW • Feature extractors trained on well-resourced datasets can improve performance • Best performance: CAE trained on BNF
CNN DTW
Correspondence autoencoder
Keyword spotting examples
Current work Mali • More volatile environment • Difficult to install transmitters without raising suspicion • Bambara, Fulani • Some transcribed data, no text

Recommend

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech synthesis (Concluding lecture) Instructor: Preethi Jyothi Nov 6, 2017 Recall: SPSS framework O Speech Speech Train Parameter

273 views • 26 slides

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

What Is Speech Recognition? EECS E6870 converting speech to text Speech Recognition automatic speech recognition (ASR), speech-to-text (STT) what its not Michael Picheny,

345 views • 22 slides

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types 1 7-Speech Recognition (Cont d) HMM Calculating Approaches

1.08k views • 74 slides

Discriminative Keyword Spotting Joseph Keshet, The Hebrew University David Grangier, IDIAP

Discriminative Keyword Spotting Joseph Keshet, The Hebrew University David Grangier, IDIAP Research Institute Samy Bengio , Google Inc. Joseph Keshet, The Hebrew University Outline Problem Definition Keyword Spotting with HMMs

1.2k views • 73 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech Synthesis (Part I) Instructor: Preethi Jyothi Oct 30, 2017 T ext- T o- S peech Systems Storied History Von Kempelens speaking machine (1791)

290 views • 8 slides

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs Text Speech vs Text Same but different Same but different Core Speech Technologies Core Speech Technologies Speech Recognition Speech

705 views • 38 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction to Statistical Speech Recognition Instructor: Preethi Jyothi Lecture 1 Course Specifics About the course (I) Main Topics: Introduction to

525 views • 36 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction to Statistical Speech Recognition Instructor: Preethi Jyothi July 24, 2017 Course Specifics Pre-requisites Ideal Background: Completed one of

732 views • 44 slides

Birdwatching Spotting Scopes April, 2020 GENERAL FEATURES OF BIRDWATCHING SPOTTING SCOPES

Birdwatching Spotting Scopes April, 2020 GENERAL FEATURES OF BIRDWATCHING SPOTTING SCOPES Birdwatching Activity that involves more extensive groups of people participating at the same time Birdwatching Spotting Scopes properties: Big and

295 views • 8 slides

Target or tactical June, 2020 spotting scopes TARGET OR TACTICAL SPOTTING SCOPES Target or

Target or tactical June, 2020 spotting scopes TARGET OR TACTICAL SPOTTING SCOPES Target or tactical spotting scopes are one and the same devices Two names are just because of two different user groups that use these kinds of optics

717 views • 8 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate A frame discrete samples Need to

441 views • 26 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR + Basics of Speech Production Instructor: Preethi Jyothi Lecture 4 Qv iz-1 Postmortem Common Mistakes: Correct Incorrect Output

589 views • 25 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker Adaptation Instructor: Preethi Jyothi Oct 23, 2017 Speaker variations Major cause of variability in speech is the di ff erences between

704 views • 18 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical Parametric Speech Synthesis Instructor: Preethi Jyothi Nov 2, 2017 Images on the first 11 slides are from Zen et al., Statistical Parametric

590 views • 19 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker Adaptation & Pronunciation modelling Instructor: Preethi Jyothi Apr 10, 2017 Speaker variations Major cause of variability in speech is the di

466 views • 22 slides

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented by Jen-Wei Kuo Reference 1. X. Huang et. al., Spoken Language Processing, Chapter 8 2. Daniel Jurafsky and James H. Martin, Speech and Language

1.05k views • 65 slides

Reuters Digital News Report 2014 Key Findings: Finland Agenda Background and methodology

Reuters Digital News Report 2014 Key Findings: Finland Agenda Background and methodology Overall news consumption Devices to access news Digital devices and t he news day Pathways to the news Social media, sharing and part

448 views • 32 slides

Outline Getzkow (2007) 1 Case Study: social vs. internet interactions 2 The gentzkow command in

C OMPLEMENTARITY ANALYSIS IN MULTINOMIAL MODELS : T HE GENTZKOW COMMAND Yunrong Li & Ricardo Mora SWUFE & UC3M Madrid, Oct 2017 1 / 25 Outline Getzkow (2007) 1 Case Study: social vs. internet interactions 2 The gentzkow command in

578 views • 25 slides

Recruiting Personal Assistants during a pandemic Welcome this webinar is being recorded for

Recruiting Personal Assistants during a pandemic Welcome this webinar is being recorded for others to watch attendees are on mute please do chat, comment and ask questions via the Questions function. This is monitored by

751 views • 25 slides

Effective Online Outreach: Tools & Tactics for Connecting Communities and Collections Kenn

Effective Online Outreach: Tools & Tactics for Connecting Communities and Collections Kenn Bicknell Metro Transportation Library June 29, 2017 Things are not always what they appear to be. Myth vs. Reality 9,000 employees, plus: Metro

892 views • 51 slides

Best Prac*ces for Sharing Language Technology Resources in

Best Prac*ces for Sharing Language Technology Resources in Minority Language Environments Delyth Prys Bangor University, Wales d.prys@bangor.ac.uk The Language

460 views • 14 slides

Fossil Fuels, Forests and Climate Change July 17, 2013 Boulder, Colorado Global Greengrants

Fossil Fuels, Forests and Climate Change July 17, 2013 Boulder, Colorado Global Greengrants Leslie Glustrom, Director of Research and Policy, Clean Energy Action 303-245-8637 lglustrom(at)gmail.com

193 views • 16 slides

Where are my glasses? I know the following statements are true. 1. If I was reading the

Proposition logic and argument CISC2100, Fall 2019 X.Zhang 1 Where are my glasses? I know the following statements are true. 1. If I was reading the newspaper in the kitchen, then my glasses are on the kitchen table. 2. If my glasses are on

1.11k views • 79 slides

Second Screen User Experiences based upon social networks of people, devices and services Dave

Second Screen User Experiences based upon social networks of people, devices and services Dave Raggett, W3C (MediaScape Project) MediaScape European FP7 project Engaging multiscreen single and multi-user experiences for live and

265 views • 10 slides

Automatic speech recognition and keyword spotting in under-resourced - PowerPoint PPT Presentation

Automatic speech recognition and keyword spotting in under-resourced languages Digital Signal Processing Group, E&E Engineering 21 February 2020 DSP group DSP group: More than speech http://dsp.sun.ac.za/~trn Communication network for

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

Discriminative Keyword Spotting Joseph Keshet, The Hebrew University David Grangier, IDIAP

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 23: Speech

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Birdwatching Spotting Scopes April, 2020 GENERAL FEATURES OF BIRDWATCHING SPOTTING SCOPES

Target or tactical June, 2020 spotting scopes TARGET OR TACTICAL SPOTTING SCOPES Target or

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 4: WFSTs in ASR

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 21: Speaker

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 24: Statistical

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 22: Speaker

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Reuters Digital News Report 2014 Key Findings: Finland Agenda Background and methodology

Outline Getzkow (2007) 1 Case Study: social vs. internet interactions 2 The gentzkow command in

Recruiting Personal Assistants during a pandemic Welcome this webinar is being recorded for

Effective Online Outreach: Tools &amp; Tactics for Connecting Communities and Collections Kenn

Best Prac*ces for Sharing Language Technology Resources in

Fossil Fuels, Forests and Climate Change July 17, 2013 Boulder, Colorado Global Greengrants

Where are my glasses? I know the following statements are true. 1. If I was reading the

Second Screen User Experiences based upon social networks of people, devices and services Dave

Effective Online Outreach: Tools & Tactics for Connecting Communities and Collections Kenn