Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF - PowerPoint PPT Presentation

Speaker Verification Under Additive Noise Conditions With Non-stationary SNR Using PMC Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF BIRMINGHAM

References • M.J. Gales and S.J. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination,” IEE Transactions on Speech and Audio Processing, Vol. 4, No. 5, pp. 352-359, September 1996. • T. Matsui, T. Kanno, S. Furui, “Speaker Recognition Using HMM Composition in Noisy Environments,” Computer Speech and Language, Vol. 10, pp. 107- 116, 1996. • O. Bellot, D. Matrouf, T. Merlin and Jean-Francois Bonastre, “Additive and Convolutive Noises Compensation for Speaker Recognition”, Proceedings of the ICSLP 2000 Beijing, China, 2000. THE UNIVERSITY 2 of 37 OF BIRMINGHAM

Task Definition • Clean verification speech : Good • Noise-contaminated verification speech with non-stationary SNR : Bad THE UNIVERSITY 3 of 37 OF BIRMINGHAM

Preview of Results • Clean speech models tested on non- stationary SNR phrases – Speech noise : 38.55% EER – Operations room noise : 34.78% EER • Performance of compensated models – Speech noise : 19.92% EER – Operations room noise : 18.84% EER THE UNIVERSITY 4 of 37 OF BIRMINGHAM

Structure of Presentation • Stage One – Evaluation of PMC on speaker verification tasks : stationary SNR conditions • Stage Two – Recognition of unknown SNR conditions • Stage Three – Modelling the dynamics of SNR in noise- contaminated verification phrases THE UNIVERSITY 5 of 37 OF BIRMINGHAM

Problem Formulation • Text-dependent speaker verification • Deployment in dynamic real world environments • Model based approach • Ultimately multi noise multi SNR scenario THE UNIVERSITY 6 of 37 OF BIRMINGHAM

Evaluation Using PMC • Successful in improving the performance of ASR systems • Based on work by Mark Gales • Evaluate use of PMC in text-dependent speaker verification tasks THE UNIVERSITY 7 of 37 OF BIRMINGHAM

Performance of PMC in ASR Experiments 100% 90% 80% 70% Accuracy 60% 50% 40% 30% 20% 10% 0% 18 12 6 0 -6 Signal to noise ratio (dB) Un-compensated Compensated Reference : Gales THE UNIVERSITY 8 of 37 OF BIRMINGHAM

Design Criteria • Additive noises considered • Scaling to be performed on noises l l l µ = µ + µ log(exp( ) g exp( )) ⊗ S N S N l l l Σ = Σ + Σ log(exp( ) g exp( )) ⊗ S N S N • Compensate only for static parameters THE UNIVERSITY 9 of 37 OF BIRMINGHAM

Implementation • Selection of databases • Preparation of data • System Structure • Scoring Procedures THE UNIVERSITY 10 of 37 OF BIRMINGHAM

Selection of Databases • Yoho speaker verification database – Standard database used, performance comparison available • Timit database – Used for the initialisation of isolated phone models prior to Yoho training • Noisex-92 noise database – Selection of repetitive noise sources. Two noise sources reported in this paper. Speech noise and operations room noise THE UNIVERSITY 11 of 37 OF BIRMINGHAM

Preparation of Data • Scaling of both enrolment and verification data • Measurement of verification speech power – Silence periods ignored [ref 7, ITU-T Rec.] • Mixing of speech and noise from –18dB to +18dB at 6dB intervals. Retain multiplication factor, g, and take an average THE UNIVERSITY 12 of 37 OF BIRMINGHAM

System Structure • Front-end – 25ms, Hamming windowed, MEL scale warped – 12 cepstral coefficients with 0 th energy appended, 1 st and 2 nd order derivatives included • HTK Software for both training and recognition • 3 state 4 component tied-triphone speaker dependent models, 1 state 4 component noise models THE UNIVERSITY 13 of 37 OF BIRMINGHAM

System Structure • Training – 96 phrases per speaker – 118 authorised – 20 for General Speaker model • Recognition – 40 phrases used for both FR and FA experiments THE UNIVERSITY 14 of 37 OF BIRMINGHAM

Scoring Procedures • Likelihood ratio test employed P ( X | S ) ≥ t P ( X | GSM ) • Performance quoted in % EER THE UNIVERSITY 15 of 37 OF BIRMINGHAM

Experiment Methodology • Establish baseline performance using clean speaker models and clean verification data • Evaluate performance of clean speaker models under multi SNR verification data • Evaluate performance of PMC compensated speaker models under multi SNR verification data THE UNIVERSITY 16 of 37 OF BIRMINGHAM

Un-compensated Models Clean speech and models performance = 0.57% 60.00% 50.00% Equal Error Rate (%) 40.00% 30.00% 20.00% 10.00% 0.00% 18 16 14 12 10 8 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 Signal to Noise Ratio (dB) Operations Room Noise Speech Noise THE UNIVERSITY 17 of 37 OF BIRMINGHAM

Compensated Models 60.00% 50.00% Equal Error Rate (%) 40.00% 30.00% 20.00% 10.00% 0.00% 18 12 6 0 -6 -12 -18 Signal to Noise Ratio (dB) Operations Room Noise Speech Noise Operations Room Noise (Std) Speech Noise (Std) THE UNIVERSITY 18 of 37 OF BIRMINGHAM

Stage One Summary • Text-dependent SV task • HTK Software used with modifications for PMC • Yoho, Timit and Noisex-92 databases used • 7 SNR scenarios considered (-18dB to +18dB) THE UNIVERSITY 19 of 37 OF BIRMINGHAM

Stage One Summary • PMC improves SV performance • 2 additive noises considered • Static parameters compensated • Baseline used : clean models, clean/contaminated speech THE UNIVERSITY 20 of 37 OF BIRMINGHAM

Experimental Extension • We now have 7 SNR specific PMC models • Can SNR specific PMC models be used for other SNRs? How sensitive are they? • If yes, how well do they perform? THE UNIVERSITY 21 of 37 OF BIRMINGHAM

Evaluation of Non-ideal PMC Models • For each SNR specific PMC model, perform SV task on noise contaminated verification phrases from –18dB to +18dB at 2dB intervals • Observe any degradation in performance from using non-ideal models THE UNIVERSITY 22 of 37 OF BIRMINGHAM

Speech Noise Result Speech Noise 60.00% 50.00% Equal Error Rate (%) 40.00% 30.00% 20.00% 10.00% 0.00% 18 16 14 12 10 8 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 Signal to Noise Ratio (dB) 18dB (0.06645) 12dB (0.132584) 6dB (0.264541) 0dB (0.527828) -6dB (1.053155) -12dB (2.101321) -18dB (4.192687) THE UNIVERSITY 23 of 37 OF BIRMINGHAM

Operations Room Noise Result Operations Room Noise 60.00% 50.00% Equal Error Rate (%) 40.00% 30.00% 20.00% 10.00% 0.00% 18 16 14 12 10 8 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 Signal to Noise Ratio (dB) 18dB (0.074531) 12dB (0.14871) 6dB (0.296715) 0dB (0.592023) -6dB (1.181242) -12dB (2.356887) -18dB (4.702608) THE UNIVERSITY 24 of 37 OF BIRMINGHAM

Discussion • Allow the selection of SNR specific PMC models based on which has the highest probability for a given observation THE UNIVERSITY 25 of 37 OF BIRMINGHAM

Automatic Model Selection 60.00% 50.00% Equal Error Rate (%) 40.00% 30.00% 20.00% 10.00% 0.00% 18 16 14 12 10 8 6 4 2 0 -2 -4 -6 -8 -10 -12 -14 -16 -18 Signal to Noise Ratio (dB) Operations Room Noise Speech Noise THE UNIVERSITY 26 of 37 OF BIRMINGHAM

Stage Two Summary • Limiting the number of SNR specific PMC models to 7 does not affect SV performance on unknown SNR • Better performance is achieved by automatic selection of models THE UNIVERSITY 27 of 37 OF BIRMINGHAM

Varying SNR Task THE UNIVERSITY 28 of 37 OF BIRMINGHAM

Modelling SNR Dynamics • Operating models in parallel assumes that SNR changes occur at model boundaries • Create one model from multiple models, with the SNR dynamics embedded within the transition probabilities THE UNIVERSITY 29 of 37 OF BIRMINGHAM

Implementation of a Composite HMM • Rows and columns correspond to different SNR, 1 st row = entry probability   Entry 0 . 3 0 . 2 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1   + 18dB 0 . 4 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1   + 12dB 0 . 1 0 . 4 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1   + 6dB 0 . 1 0 . 1 0 . 4 0 . 1 0 . 1 0 . 1 0 . 1   0dB 0 . 1 0 . 1 0 . 1 0 . 4 0 . 1 0 . 1 0 . 1     − 6dB 0 . 1 0 . 1 0 . 1 0 . 1 0 . 4 0 . 1 0 . 1   − 12dB 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1 0 . 4 0 . 1   −   18dB 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1 0 . 1 0 . 4 THE UNIVERSITY 30 of 37 OF BIRMINGHAM

Implementation of a Composite HMM • 3 dimensional model SNR • 1 state noise model • 3 state speech model Speech Noise • 7 state SNR model THE UNIVERSITY 31 of 37 OF BIRMINGHAM

Expectations • Extracting true SNR dynamics and embedding it into the transition probabilities will further improve performance [ to be evaluated ] THE UNIVERSITY 32 of 37 OF BIRMINGHAM

Varying SNR Task THE UNIVERSITY 33 of 37 OF BIRMINGHAM

Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF - PowerPoint PPT Presentation

Speaker Verification Under Additive Noise Conditions With Non-stationary SNR Using PMC Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF BIRMINGHAM References M.J. Gales and S.J. Young, Robust Continuous Speech

Wednesday, November 14 at 2:30 p.m. 1 Russell Street Bridge Design Russell Street Bridge

RUSSELL RAIL PRESENTATION 12 TH OCTOBER 2018 RUSSELL GROUP Russell Group is a privately owned,

RUSSELL & NORVIG, CHAPTERS 12: RUSSELL & NORVIG, CHAPTERS 12: INTRODUCTION TO AI

The Future of GPU/Accelerator Programming Models LLVM HPC 2015 Michael Wong (IBM)

C. James Wong James.Wong@sjcd.edu San Jacinto College South Campus, Houston, TX What is the

Eva WONG & Theresa KWONG (King CHONG, Dimple THADANI & Wing Leung WONG) Centre for

[Re-]Entry Education Connie Li Jack Swiggett Serena Wong Our Team Jack Swiggett Connie Li

Picard Groups of Stable Module Categories Richard Wong GROOT Summer Seminar 2020 Slides can be

An Overview of Algebraic Topology Richard Wong UT Austin Math Club Talk, March 2017 Slides can

Surfaces, Space, and Hyperspace An exploration of 2, 3, and higher dimensions Richard Wong UT

CPSC 533 Reinforcement Learning Paul Melenchuk Eva Wong Winson Yuen Kenneth Wong Outline

Mirror File System A Multiple Server File System John Wong CTO John.Wong@TwinPeakSoft.com Twin

Invertible Objects: An Elementary Introduction to Picard Groups Richard Wong Math Club 2020

Martin Law Firm Martin Law Firm Martin Law Firm Martin Law Firm 1- -800 800- -633 633-

Today References See Russell and Norvig, chapter, 2 and 7 Russell and Norvig Kinds of Agents

Towards an Error Model for OpenMP Michael Wong, Michael Klemm, Alejandro Duran, Tim Mattson,

Transforming Primary Care Proactive Care 13 th April 2016 01 Welcome and Introduction

at VCU Health VCU Health Communications Center AKA Telepage Multiple Services, Multiple

Pediatry education system in Kazakhstan By Prof. Konrad T. Juszkiewicz, MD MPH, PhD KazNMU,

Care Quality Commission Review Health and Wellbeing Board 27 th November 2017 Background to the

CPO 4M Presentation County Significant Natural Resources March 21, 2017 Presentation Overview

INVESTORS J U N E 2 0 2 0 WARNING CONCERNING FORWARD LOOKING STATEMENTS This presentation

FSA Full Speed Ahead 2016 2017 Integration and Identity R R University of Nebraska

HUMAN SPEECH RECOGNITION PERFORMANCE ON THE 1994 CSR SPOKE 10 CORPUS by Will Ebel and Joe

Sambuz

Useful Links

Newsletter

Mail Us

Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF - PowerPoint PPT Presentation

Speaker Verification Under Additive Noise Conditions With Non-stationary SNR Using PMC Michael L P Wong & Martin J Russell THE UNIVERSITY 1 of 37 OF BIRMINGHAM References M.J. Gales and S.J. Young, Robust Continuous Speech

Wednesday, November 14 at 2:30 p.m. 1 Russell Street Bridge Design Russell Street Bridge

RUSSELL RAIL PRESENTATION 12 TH OCTOBER 2018 RUSSELL GROUP Russell Group is a privately owned,

RUSSELL &amp; NORVIG, CHAPTERS 12: RUSSELL &amp; NORVIG, CHAPTERS 12: INTRODUCTION TO AI

The Future of GPU/Accelerator Programming Models LLVM HPC 2015 Michael Wong (IBM)

C. James Wong James.Wong@sjcd.edu San Jacinto College South Campus, Houston, TX What is the

Eva WONG &amp; Theresa KWONG (King CHONG, Dimple THADANI &amp; Wing Leung WONG) Centre for

[Re-]Entry Education Connie Li Jack Swiggett Serena Wong Our Team Jack Swiggett Connie Li

Picard Groups of Stable Module Categories Richard Wong GROOT Summer Seminar 2020 Slides can be

An Overview of Algebraic Topology Richard Wong UT Austin Math Club Talk, March 2017 Slides can

Surfaces, Space, and Hyperspace An exploration of 2, 3, and higher dimensions Richard Wong UT

CPSC 533 Reinforcement Learning Paul Melenchuk Eva Wong Winson Yuen Kenneth Wong Outline

Mirror File System A Multiple Server File System John Wong CTO John.Wong@TwinPeakSoft.com Twin

Invertible Objects: An Elementary Introduction to Picard Groups Richard Wong Math Club 2020

Martin Law Firm Martin Law Firm Martin Law Firm Martin Law Firm 1- -800 800- -633 633-

Today References See Russell and Norvig, chapter, 2 and 7 Russell and Norvig Kinds of Agents

Towards an Error Model for OpenMP Michael Wong, Michael Klemm, Alejandro Duran, Tim Mattson,

Transforming Primary Care Proactive Care 13 th April 2016 01 Welcome and Introduction

at VCU Health VCU Health Communications Center AKA Telepage Multiple Services, Multiple

Pediatry education system in Kazakhstan By Prof. Konrad T. Juszkiewicz, MD MPH, PhD KazNMU,

Care Quality Commission Review Health and Wellbeing Board 27 th November 2017 Background to the

CPO 4M Presentation County Significant Natural Resources March 21, 2017 Presentation Overview

INVESTORS J U N E 2 0 2 0 WARNING CONCERNING FORWARD LOOKING STATEMENTS This presentation

FSA Full Speed Ahead 2016 2017 Integration and Identity R R University of Nebraska

HUMAN SPEECH RECOGNITION PERFORMANCE ON THE 1994 CSR SPOKE 10 CORPUS by Will Ebel and Joe

Sambuz

Useful Links

Newsletter

Mail Us

RUSSELL & NORVIG, CHAPTERS 12: RUSSELL & NORVIG, CHAPTERS 12: INTRODUCTION TO AI

Eva WONG & Theresa KWONG (King CHONG, Dimple THADANI & Wing Leung WONG) Centre for