On the use of phone-gram units in recurrent neural networks for language identification Christian Salamea, Luis F. D'Haro, Ricardo de Córdoba, Rubén San-Segundo Speech Technology Group. Dept. of Electronic Engineering. Universidad Politécnica de Madrid
Summary of the paper Phone-gram units in RNN Phonemes as input Using a 1-N codification (N the total number of phonemes) We’ve incorporated contextual information in the NN input: uniphones, diphones, and trigrams We propose the concatenation of n-adjacent phonemes RNNLM-P (phone grams) applied to LID: Based on a PPRLM architecture: For each phonetic recognizer, a phoneme sequence is obtained In evaluation, for each utterance, an entropy metric is obtained from the RNNLM The entropy scores are calibrated and fused Odyssey 2016 - Bilbao
Summary of the paper Phonetic Vector Representation For Vocabulary Reduction Use K-means to group phone-grams Neural Embedding Models with the Skip-Gram model We work at the phone level 7.3% improvement thanks to vocabulary reduction The most relevant RNN parameters are considered The Number of neurons in the state layer (NNE) Number of classes (NCS). Phone-grams are grouped in the output layer in a factorization process A high NCS value speeds up the RNN training but the final language model is less accurate Number of state layers (MEM) corresponding to previous times. Previous context information is taken into account Odyssey 2016 - Bilbao
Results System System Abs Abs MFCCs MFCCs 7,60 7,60 PPRLM PPRLM 11,57 11,57 RNNLM-P RNNLM-P 10,87 10,87 Cavg Cavg System System Abs Abs Improve % Improve % RNNLM-P+PPRLM RNNLM-P+PPRLM 10,51 10,51 9,2 9,2 PPRLM+MFCCs PPRLM+MFCCs 5,10 5,10 32,9 32,9 RNNLM-P+MFCCs RNNLM-P+MFCCs 5,04 5,04 33,7 33,7 RNNLM-P+PPRLM+MFCC RNNLM-P+PPRLM+MFCC 4,80 4,80 36,8 36,8 Odyssey 2016 - Bilbao
Recommend
More recommend