Dealing with Noisy and/or Sparse Data: The Case for Hybrid - PowerPoint PPT Presentation

Dealing with Noisy and/or Sparse Data: The Case for Hybrid Approaches Abeer Alwan Speech Processing and Auditory Perception Laboratory (SPAPL) Department of Electrical Engineering, UCLA http://www.ee.ucla.edu/~spapl alwan@ee.ucla.edu

Key Argument for Hybrid Approaches in Speech Processing: Variability • The variability in the way humans produce speech due to, for example, gender, accent, age, and emotion necessitates data-driven approaches to capture significant trends/behavior in the data. • The same variability, however, may not be modeled adequately by such systems especially if data are limited and/or corrupted by noise.

Projects (last 5 years) Hybrid Statistical Modeling and Knowledge- Based Approach to Improve: -rapid speaker normalization (including kids speech) -cross-language adaptation -height estimation -noise robust ASR Speech Production Modeling -modeling the voice source by using high- speed imaging Bird Song and Species Identification Funding sources in the last 5 years: NSF, DARPA, and industry.

Challenges in ASR of Kids ’ Speech • Lack of large databases of children ’ s speech • Significant intra- and inter-speaker variability • Significant variability in pronunciations due to different linguistic backgrounds, and misarticulations • Low signal-to-noise ratio in the classroom • Distinguishing reading errors from pronunciation differences

Effect of Age on Resonances Adult male: 8-year old boy saying vowel /uw/ the same vowel Children have shorter vocal tracts, and hence higher resonances. More variability than adults. Less control of articulators. Higher Pitch.

Pronunciation Modeling • Knowledge-based hypothesis – Acoustic phonetic knowledge transfer • Linguistic Hypotheses regarding consonants: • /v/ /f/ (very) • /z/ /s/ (those) Mapping English Acoustically similar • /dh/ /d/ Phoneme Spanish Phonemes • /th/ /t/ • /r/ /rr/: word initial Think position • /y/ /jh/ • /s/ /z/ Listen • Unaspirated /p/, /t/, /k/: Produce word initial position

Using subglottal resonances for speaker ID and speaker normalization (2010-2015) • The subglottal system is practically time invariant unlike the supraglottal vocal tract. – Can potentially characterize a speaker better, or at least provide complementary information. 3000 3000 3000 Frequency (Hz) Frequency (Hz) Frequency (Hz) 2000 2000 2000 green dots: formants 1000 1000 1000 red dots: SGRs 0 0 0 0 0 0 400 400 400 800 800 800 1200 1200 1200 1600 1600 1600 Time (ms) Time (ms) Time (ms)

Height estimation: evaluation • Training data: SGRs and heights of 50 speakers. • Evaluation data: speech signals of 604 speakers. Using Sg1 Using Sg2 Ganchev et al . mean abs. error 5.3 cm 5.4 cm 5.3 cm RMS error 6.6 cm 6.7 cm 6.8 cm • Main advantages of the proposed algorithm: – Only 1 feature (Sg1 or Sg2), as opposed to 50 vocal-tract features for Ganchev et al . – Very little training data (50 speakers vs. 468). (Speech Communication, 2013 )

Concept of Correlogram-based Time-Freq Domain Pitch Estimation Filtered Correlogram time waveform High Freq. Short-Time AutoCorr. Speech Short-Time AutoCorr. : : Low Freq. Averaged across channels : : Auditory Filterbank Summary Correlogram (2010-2014) 9 9

Variance and Invariance in Speech Quality • Data collected in collaboration with the Linguistics department and Medical school • Inter-speaker variability – Day/time variability (session variability) – Read speech vs. conversational speech – Low-affect speech vs. high-affect speech • Recordings – Steady-state vowel /a/ (3 repetition) – Reading sentences – Explaining something to someone they do not know – Phone call to someone they know – Telling something unimportant/ joyful/ annoying – Speaking to pets 10

Research Directions • Analysis and recognition of kids ’ speech (including longitudinal studies) • Studies of the role of articulatory/linguistic features in speech processing (human and machine) • Studies of natural emotions (not acted) • Human and Machine Recognition in naturally-noisy data • Analysis and recognition of disordered speech • Articulatory data: ultrasound, MRI, EMMA, high-speed imaging • Accented speech

Evaluating Proposals/Ideas at Academic Institutions • Academic research should be exploratory in nature and the source of creative ideas which may or may not lend itself to immediate practical success.

Subglottal Resonances • Subglottal features are useful for: (1) height estimation, (2) speaker normalization for ASR, (3) speaker identification, and (4) cross-language adaptation. – Effective with limited data. – Robust to environmental noise. Collaborative research with psychology and speech science.

Dealing with Noisy and/or Sparse Data: The Case for Hybrid - PowerPoint PPT Presentation

Dealing with Noisy and/or Sparse Data: The Case for Hybrid Approaches Abeer Alwan Speech Processing and Auditory Perception Laboratory (SPAPL) Department of Electrical Engineering, UCLA http://www.ee.ucla.edu/~spapl alwan@ee.ucla.edu

Dealing With The Irate Customer Dealing With The Irate Customer Dealing with difficult

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Hybrid Construction Hybrid Construction Hybrid Construction Hybrid Construction 1 VP

Sparse Matrices Example Of Sparse Matrices diagonal tridiagonal sparse many elements are

An NFR Pattern Approach to Dealing An NFR Pattern Approach to Dealing An NFR Pattern Approach to

Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model Web/CD Hybrid Model for t he Dist

Hybrid Automobiles Hybrid Automobiles It switches easily between fuel, batteries, or both It

Sparse Matrices sparse many elements are zero dense few elements are zero Example Of

Sparse tensors are a natural way of representing real-world data 1 Sparse tensors are a natural

A User-Friendly Hybrid Sparse Matrix Class in C++ Conrad Sanderson, Ryan R. Curtin July 19, 2018

Hybrid NLP Hybrid NLP O UTLINE O UTLINE Problems of Deep and Shallow Processing

EXPO REAL Hybrid Summit Your virtual exhibition EXPO REAL Hybrid Summit The Hybrid Conference

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Dealing Dealing with the News with the News Media in Media in Crisis Crisis Response

Cross Border Update Dermot Corry Dealing/Transaction Accounts Dealing/transaction accounts

Dealing with Winter Neighbourhood Operations 1 Dealing with Winter Background to

Dr. Harvey Max Chochinov Canada Research Chair in Palliative Care Director, Manitoba Palliative

Development of an Outpatient Palliative and Supportive Care Nurse Practitioner Practice: Dos,

A Snapshot of the Mobile HTML5 Revolution @ jamespearce The Pledge Single device Multi device

APNA 29th Annual Conference Session 3031: October 30, 2015 IMPLEMENTATION OF A DE-ESCALATION

CIVIL Project C ualidad I ndividual de la V oz en la I dentificacin de L ocutores 2010

Organic Compounds in Water and Wastewater Oil Spill Cleanup and Surfactant Use Kristie

Bulk phase behaviour and surface properties of oppositely charged block

Polyelectrolyte gels for consumer products Dissipative particle Model development and Modeling

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us