speaker recognition and speaker recognition and the etsi
play

Speaker Recognition and Speaker Recognition and the ETSI Standard - PowerPoint PPT Presentation

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed Speech Distributed Speech Recognition Front- -End End Recognition Front Charles Broun David Pearce William Campbell Holly Kelleher Motorola


  1. Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed Speech Distributed Speech Recognition Front- -End End Recognition Front Charles Broun David Pearce William Campbell Holly Kelleher Motorola Labs Motorola Limited Human Interface Lab Basingstoke, UK Tempe, Arizona, USA

  2. Outline Outline Outline • Background • Speaker Verification – Embedded Process – Distributed Process • Distributed Speech Recognition (DSR) • Classifier • Experimental Setup • Results • Conclusion 2

  3. Background Background Background Motivation • Issues with Embedded Solutions Mobile devices do not have the necessary memory or battery capacity Updating software requires access to each device Multiple devices may contain different speaker models • Potential Benefits of Distributed Solutions Server supports computation and memory requirements Software updates are handled in a single location A single speaker model may support multiple mobile devices – enabling a ‘portable’ interface • Distributed Speech Recognition (DSR) Standard This standard addresses the above issues for speech recognition Can work on this standard be leveraged for speaker verification? 3

  4. Speaker Verification Speaker Verification Speaker Verification Embedded Process • Feature extractor and classifier typically combined into a proprietary solution • Can jointly optimize both components • Target system must support computation & memory requirements of both components Accept >T Input Score Feature Compare to Classifier Speech Extractor Threshold, T Data <T Reject Speaker Model 4

  5. Speaker Verification Speaker Verification Speaker Verification Distributed Process • Feature extractor is standardized • Cannot jointly optimize both components • Client only supports computational & memory requirements of feature extractor • Server supports higher load of classifier Accept >T Input Score Feature Compare to Classifier Speech Extractor Threshold, T Data <T Reject Wireless Channel Speaker Model 5

  6. DSR DSR DSR Background of DSR Standard • Motivation of Standard Front-End Potential benefits of distributed solutions for speech recognition Eliminates voice/vocoder channel mismatch • Activities European Telecommunications Standards Institute (ETSI) Aurora Working Group within ETSI First standard published in February 2000 6

  7. DSR DSR DSR ETSI Standard DSR System Concept • Terminal front-end targeted to mobile devices • Features transmitted over a low-error data channel • Speech recognizer runs on high power server Terminal DSR Front-End Parameterisation Frame Compression Structure & M el-Cepstrum Split V Q Error Protection W ireless Data Channel – 4.8 kbit/s Server DSR Back-End Error Detection Decompression Recognition & M itigation 7

  8. DSR DSR DSR ETSI Standard DSR Front-End • Feature set consists of 12 mel-cepstum coefficient, logE, C0 • Quantization supports a data rate of 4800 b/s • Error protection supports robustness to transmission errors Input Speech ADC Offcom Framing PE W FFT MF LOG DCT logE Abbreviations: Feature Compression ADC Analog-to-digital conversion Offcom Offset compensation PE Pre-emphasis logE Energy measure computation Bit Stream Formatting W Windowing FFT Fast Fourier transform MF Mel-filtering LOG Non-linear transform To Transmission Channel DCT Discrete cosine transform 8

  9. Classifier Classifier Classifier Polynomial Classifier [ ] t = = Given x x x and K 2 • Compute the polynomial basis vector 1 2 [ ] t = 2 2 p ( x ) 1 x x x x x x 1 2 1 1 2 2 • Apply a polynomial discriminant = t d ( x , w ) w p ( x ) function 1 M 1 M • Compute the score as the average ∑ ∑ = t = t s w p ( x ) w p ( x ) k k M M across all frames = = k 1 k 1 DSR Polynomial Discriminant Score Average s x Feature Basis Vector Function Σ k Vectors p ( x ) d ( x , w ) Speaker Model w 9

  10. Experimental Setup Experimental Setup Experimental Setup YOHO Database • 138 speakers • Enrollment – 4 sessions – 24 phrases – “23-45-56” • Testing – 10 sessions – 4 phrases – “45-23-56” Speaker Verification System • Classifier: 3 rd order polynomial • Features: 12 MFCCs from the DSR front-end • Channel: GSM bit-error masks 10

  11. Results Results Results Performance Average Equal Error Rate (%) for a 1-Phrase Test Verify Un- Error EP1 EP2 EP3 quantized -Free Enroll Unquantized 1.18 - - - - Error-Free - 1.22 1.22 1.26 1.67 EP1 - 1.22 1.22 1.26 1.67 EP2 - 1.22 1.22 1.27 1.66 EP3 - 1.26 1.26 1.30 1.70 11

  12. Conclusion Conclusion Conclusion Demonstrated that the ETSI Standard Distributed Speech Recognition Front-End is viable for speaker verification 12

Recommend


More recommend