Speech Detection for Text-Dependent Speaker Verification Orith - PowerPoint PPT Presentation

Speech Detection for Text-Dependent Speaker Verification Orith Toledo-Ronen Persay Ltd.

Outline • Motivation • Review of existing techniques • HMM-based speech detection • The Evaluation Track corpus • Experimental results • Summary

Motivation • Improving end-point detection improves text-dependent speaker verification performance • Existing algorithms: energy-based voice activity detector (VAD) • Problem: background speech may pass the energy threshold

Existing Techniques • Energy • Amplitude • Zero-crossing rate • Linear prediction error • Pitch • HMM

Comparison of Techniques • Energy-based VAD - Statistics on frame energy - Threshold setting • HMM-based VAD - Speaker dependent model - Password detection - Filters the noise

Energy-based VAD • Compute the energy of all frames • Find statistics of energy values Ω (E) • Compute the energy threshold T = f ( Ω (E)) • Filter out all frames with energy below T

HMM-based VAD • A left-to-right hidden Markov model of the phrase • Not phoneme-based • Trained from 3 repetitions

Training • Use the energy-based VAD first • Train the speaker HMM • Train a background HMM from: - noise segments - background speech • Merge the speaker and background HMMs

Merging Models Audio Noise Speaker Noise

Detection • Run Viterbi with the merged HMM and find the speaker’s states in the segmentation • Use the HMM VAD as a filter before verification

Example

The Evaluation Track Corpus • Database : Persay’s TD corpus • Passwords : 9-digit telephone number 4-digit personal code • Speakers : 45 males 37 females • Impostors : up to 5 same-gender impostors for each speaker

The Evaluation Track Corpus • Sessions : ~5 calls per speaker with 3 repetition of each password in each call • Media : cellular phone • Language : Hebrew

Experimental Results • Results : % Equal Error Rate Gender Password Energy HMM H+E E+H Male 9-digit 7.2 8.1 8.7 6.7 4-digit 11 .1 12.6 10.8 9.0 Female 9-digit 6.3 5.8 7.1 6.4 4-digit 10 .8 12.2 12.5 12.4

Password Rejection • Impostor : the Viterbi path does not reach the speaker’s model • Partial password : the Viterbi path does not cover all the speaker’s states Gender Password H+E E+H Male 9-digit 1 / 39 5 / 54 % Rejected (Target / Impostor) 4-digit 0 / 21 3 / 45 Female 9-digit 2 / 52 6 / 82 4-digit 1 / 33 7 / 68

Password Rejection - Cont’d • The Persay’s TD corpus was manually cleaned by a human listener. • Rejected by human: 102 target attempts 115 impostor attempts • Algorithm rejection: 33% target attempts 86% impostor attempts

Password Rejection - Cont’d • Segments rejected by human and algorithm: - non-speech: DTMFs, ring tone, silence - corrupted audio - wrong password - strong background speech • Segments rejected only by human: - all contain the password, by poor quality - low volume, background speech, error and repair

Summary • We have presented a method for speech detection in a text-dependent speaker verification system. • The HMM-based VAD can be used in combination of an energy-based VAD. • It can detect the password and reject invalid verification audio segments.

Speech Detection for Text-Dependent Speaker Verification Orith - PowerPoint PPT Presentation

Speech Detection for Text-Dependent Speaker Verification Orith Toledo-Ronen Persay Ltd. Outline Motivation Review of existing techniques HMM-based speech detection The Evaluation Track corpus Experimental results

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

10 slides that always work Simple text boxes (I) Sample text Sample text Sample text

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Joint Factor Analysis for Text-Dependent Speaker Verification Patrick Kenny, Themos Stafylakis,

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

Speech Processing 11-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15-492/18-492 Speaker ID Who is speaking? Speaker ID, Speaker Recognition

CONTENT TITLE Insert Subtitle Here Enter Text Here Enter Text Here Enter Text Here

Combining Speech and Speaker Recognition - A Joint Modeling Approach Hang Su Supervised by:

W3C Speaker Identification W3C Speaker Identification and Verification Workshop and Verification

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

Post-Conference Presentation Sunday Oladayo Oladejo Table of Content A Introduction B

Deep Neural Networks based Text- Dependent Speaker Verification Gautam Bhattacharya, Jahangir

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Parent Math Night Welcome Thank you for joining us tonight. Big Ideas - The Shifts in Common

FY 2015 Results Presentation New York, April 11 th 2016 Agenda Presentation 11:00am 11:45am

CONTRACTING BASICS 410 th COR Training 410th CSB 410th CSB LEARNING OBJECTIVES CONTRACTING

Q3 2019 SALES Continued growth acceleration October 18, 2019 Ccile Cabanis CFO I 1 I

Escape the Room! You have been helping your teacher to tidy up the sports equipment after a P.E.

Somatotopic Map and Inter- and Intra-Digit Distance in Brodmann Area by Vibration and Pressure

MyECC 101 Presentation El Camino College 1 MyECC 101 Presentation El Camino College These are

European Citizens Initiative IT developments (central online collection system and file

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us