Missing data speech recognition in rever- berant acoustic conditions - PowerPoint PPT Presentation

Nov 06, 2023 •110 likes •231 views

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test

Missing data speech recognition in rever- berant acoustic conditions Kalle, Guy and Jon ONE SIX FOUR... ! Mr. M. D. Dummy
Content 1.Introduction to reverberation 2.Speech modulation frequencies 3.Reverberation masking model 4.Test conditions 5.Results 6.Discussion
1. Introduction to reverberation 30 25 20 15 10 5 0 −5 −10 −15 −15 −10 −5 0 5 10 15 20 25 30 Fig1. Image expansion. Direct sound.
1. Introduction to reverberation 150 100 50 0 −50 −100 −150 −200 −150 −100 −50 0 50 100 150 200 Fig2. Image expansion. 3rd order reflections.
1. Introduction to reverberation 1 0 0.8 −20 0.6 −40 0.4 0.2 −60 0 −80 −0.2 −0.4 −100 −0.6 −120 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 −0.8 4 0 1000 2000 3000 4000 5000 6000 7000 x 10 Fig3. Room impulse responses. Left: linear amplitude. Right: log-amplitude.
2. Speech modulation frequencies 30 30 25 25 20 20 15 15 10 10 5 5 10 20 30 40 50 60 70 80 20 40 60 80 100 120 140 160 Fig4. Ratemap for Fig5. FFT of the utterance 5527. ratemap of Fig3.
3. Reverberation masking model g rate map cube root auditory downsampling by compression missing data filterbank leaky integrator speech masking recogniser threshold filter mask Magn (dB) 20 10 0 −10 0 10 20 30 40 50 Freq. (Hz) Fig6. Diagram of the model.
3. Reverberation masking model 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 6.4 6.4 2 2 0.5 0.5 0 0.5 1 1.5 2 2.5 0 0.5 1 1.5 2 2.5 Fig7. Rate maps; Top: clean. Fig8. Masks; Top: a-priori. Bottom: reverberated Bottom: reverb. masking
4. Test conditions Room 1: 15 x 13 x 6.5 m 3 , T60=0.7 sec., D/R=0 & -10 dB Room 2: 25 x 20 x 6 m 3 , T60=1.7 sec., D/R=0 & -10 dB Room 3: 55 x 35 x 14 m 3 , T60=2.7 sec., D/R=0 & -10 dB
5. Results T 60 =0.7s T 60 =0.7s T 60 =1.7s T 60 =2.7s T 60 =2.7s T 60 =1.7s Recognition Clean D/R= D/R= D/R= D/R= D/R=- Technique D/R=0dB 0dB -10db -10dB 0dB 10dB Unity mask 98.26 62.48 46.39 42.82 29.24 34.90 24.28 MFCC 99.65 60.40 47.08 47.35 34.46 40.73 28.55 Reverb. 90.33 85.03 82.25 71.11 66.05 45.52 masking a priori mask 95.82 93.73 92.42 89.12 90.60 87.99 T
6. Discussion •Advantage over robust feature tech- niques: + Mask estimation can be changed on the fly when condi- tions change -> no-retraining required when the rule is changed + Better performance ??? • Currently all thresholds are hand tuned, an adaptive system is under work

Recommend

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary Acoustic Modeling Acoustic Modeling Speech and Signal Variability Speech and Signal Variability Measuring

622 views • 27 slides

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech

Speech Processing 15-492/18-492 Speech Recognition Intro Acoustic modelling HMMs Speech Recognition From acoustics to text From acoustics to text Acoustic modeling Acoustic modeling Recognizing all forms of all phonemes

655 views • 27 slides

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types 1 7-Speech Recognition (Cont d) HMM Calculating Approaches

1.08k views • 74 slides

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs Text Speech vs Text Same but different Same but different Core Speech Technologies Core Speech Technologies Speech Recognition Speech

705 views • 38 slides

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

What Is Speech Recognition? EECS E6870 converting speech to text Speech Recognition automatic speech recognition (ASR), speech-to-text (STT) what its not Michael Picheny,

345 views • 22 slides

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV

Acoustic Acoustic Control Systems BV Acoustic Acoustic Control Systems BV Control Systems BV Control Systems BV GARDEREN GARDEREN THE NETHERLANDS THE NETHERLANDS THE NETHERLANDS THE NETHERLANDS www.acs.eu www.acs.eu 90 80 70 60

609 views • 58 slides

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary

Speech Processing 15-492/18-492 Speech Recognition Acoustic modeling Pronunciation dictionary Acoustic Modeling Speech and Signal Variability Speech and Signal Variability Measuring Error Measuring Error Pronunciation

575 views • 28 slides

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented by Jen-Wei Kuo Reference 1. X. Huang et. al., Spoken Language Processing, Chapter 8 2. Daniel Jurafsky and James H. Martin, Speech and Language

1.05k views • 65 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic Feature Extraction for ASR Instructor: Preethi Jyothi Feb 13, 2017 Speech Signal Analysis Generate A frame discrete samples Need to

441 views • 26 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech synthesis (Concluding lecture) Instructor: Preethi Jyothi Nov 6, 2017 Recall: SPSS framework O Speech Speech Train Parameter

275 views • 26 slides

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 14: Language

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 14: Language Models (Part I) Instructor: Preethi Jyothi Feb 27, 2017 So far, acoustic models Acoustic Context Pronunciation Language Models

425 views • 39 slides

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Topics Definition of speech recognition Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does speech recognition work 10/11/2008 Speaker recognition Problems of speech and speaker recognition

325 views • 6 slides

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch

Effective Open Source Speech Recognition in Your Application #kde-speech Peter Grasch peter@grasch.net The Basics Speech model Decoder Acoustic model Language model Sounds Vocabulary Grammar Open Source Speech Recognition

411 views • 14 slides

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1.

Multiple Imputation for Missing Data in KLoSA Juwon Song Korea University and UCLA Contents 1. Missing Data and Missing Data Mechanisms 2. Imputation 3. Missing Data and Multiple Imputation in Baseline KLoSA Data Missing Data and Multiple

634 views • 35 slides

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone Sequence To Speech Articulatory Approaches Concatenative Approaches HMM-based Approaches Rule-Based Approaches 1 Speech Synthesis Concept

749 views • 57 slides

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data

Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH , 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex problem We must consider:

377 views • 22 slides

GSM SPEECH PROCESSING ECE 2526 MOBILE COMMUNICATION Wednesday, 18 March 2020 1 BASIC SPEECH

GSM SPEECH PROCESSING ECE 2526 MOBILE COMMUNICATION Wednesday, 18 March 2020 1 BASIC SPEECH PROCESSING FUNDAMENTALS (1) Basic GSM Band 890 + n*0.2 Basic GSM Band 2 890 + n*0.2 +45 GSM SPEECH PROCESSING FUNDAMENTALS -REVISITED 1. A GSM

106 views • 9 slides

KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi Hiai, Yuka Otani, Takashi

KitAi-PI: Summarization System for NTCIR-14 QA Lab-PoliInfo Satoshi Hiai, Yuka Otani, Takashi Yamamura and Kazutaka Shimada Department of Artificial Intelligence, Kyushu Institute of Technology 1 Contents Introduction and Objective

583 views • 19 slides

EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical

7/21/2017 EE679: Speech Processing EE679: Speech Processing A preview A preview Dept of Electrical Engineering I.I.T. Bombay 1 Department of Electrical Engineering , IIT Bombay Why do we need a special course for signal processing of

340 views • 16 slides

universal design universal design principles - NCSW equitable use flexibility in use

chapter 10 universal design universal design principles - NCSW equitable use flexibility in use simple and intuitive to use perceptible information tolerance for error low physical effort size and space for

631 views • 9 slides

EXEMPLAR-BASED SPEECH RECOGNITION IN A RESCORING APPROACH Georg Heigold, Google, USA Joint work

EXEMPLAR-BASED SPEECH RECOGNITION IN A RESCORING APPROACH Georg Heigold, Google, USA Joint work with Patrick Nguyen, Mitch Weintraub, Vincent Vanhoucke Outline Motivation & Objectives Tools: Conditional Random Fields, Dynamic Time

277 views • 27 slides

Focusing Language Models For Automatic Speech Recognition Daniele Falavigna, Roberto Gretter

Focusing Language Models For Automatic Speech Recognition Daniele Falavigna, Roberto Gretter FBK, Italy The work leading to these results has received funding from the European Union under grant agreement n 287658 Text fr Fuzeile 12/7/12

210 views • 19 slides

simon Open-Source Speech Recognition Developed by the non profit organization Simon Listens in

simon Open-Source Speech Recognition Developed by the non profit organization Simon Listens in cooperation with Cyber-Byte IT services Introducing: David 17 years old Hobbies: Music TV Friends Girls Page 2 of 13

378 views • 16 slides

Speech synthesis Marc Schrder, DFKI schroed@dfki.de 28 January 2009 What is text-to-speech

Foundations of Language Science and Technology Speech synthesis Marc Schrder, DFKI schroed@dfki.de 28 January 2009 What is text-to-speech synthesis? You have one message from Dr. Johnson. TTS Marc Schrder, DFKI 2 Applications of

717 views • 30 slides