Experiments with Multisource Decoding and ‘ A priori ’ Fragments Speech and Hearing Research Group, Dept. Computer Science, University of Sheffield, UK June 6, 2002
� � Some Experiments with Multisource Decoding and ‘A priori’ Fragments The Multisource System Speech/Background Noisy Speech Coherent Fragments Segregation Hypothesis Multisource Decoder freqeuencty Bottom Up Top Down Processing Search Speech time Speech and Background Fragments Models Word-sequence Hypothesis Testing issues: Need highly non-stationary test data to properly test the approach Need a strategy for allowing back-end to be tested in isolation of front-end Jun 7, 2002 1
� � � Some Experiments with Multisource Decoding and ‘A priori’ Fragments The Noise Sources Violins 5 10 15 20 25 30 50 100 150 200 250 300 350 400 450 500 Drums 5 10 15 20 25 30 100 200 300 400 500 600 Speech (AURORA utterances with opposing gender) Jun 7, 2002 2
� � � � � � Some Experiments with Multisource Decoding and ‘A priori’ Fragments Constructing the test set Aurora test set A clean utterances ordered by length 318 M/F pairs of matched length identified i.e. 318 target utterances, and 318 masking utterances 10 second Drum and Violin extracts downsampled to 8KHz and filtered with G712 filter Drum and Violin masking noises for each of the 318 targets cut from the 10 second extracts AURORA targets + masking noise mixed so that SNR averages at 0dB during target speech Jun 7, 2002 3
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Using A Priori Test Fragments Use knowledge of signals prior to mixing to mark out a set of ‘ideal’ test fragments. i.e. Each fragment contains energy from either the target or the mask. Speech/Background Noisy Speech Coherent Fragments Segregation Hypothesis Multisource Decoder freqeuencty Bottom Up Top Down Processing Search time Speech Speech and Background Fragments Models Word-sequence Hypothesis Apriori Fragments Speech Source Noise Source Speech and Background Fragments Jun 7, 2002 4
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Example fragments for Speech + Drums mixed 4 KHz 50 Hz fragments Frequency correct segmentation 0.5 1.0 1.5 2.0 Time (s) Jun 7, 2002 5
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Example fragments for Speech + Speech mixed 4 KHz 50 Hz fragments Frequency correct segmentation 0.5 1.0 1.5 2.0 Time (s) Jun 7, 2002 6
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Recognition Results speech violins drums Standard 28.6 7.9 -8.0 Soft MD 30.1 54.3 47.0 Adaptive 29.4 76.7 45.0 a priori MD 94.8 94.0 94.4 fragments 42.4 65.4 58.6 i.e. disappointing results - insufficient information in speech models to organise the fragments. Jun 7, 2002 7
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Gender Dependency 2 5 2 8 3 male + 7 0 9 4 female 4 KHz 50 Hz correct segmentation (blue=male, red=female) Frequency male hypothesis: − 5 − − 3 female hypothesis: 7 0 9 4 0.5 1.0 1.5 2.0 Time (s) Awaiting results... Jun 7, 2002 8
Some Experiments with Multisource Decoding and ‘A priori’ Fragments High Frequency Recruitment The decoder is given the correct segmentation in the low frequency region. Can it selectively recruit the correct high frequency fragments? 2 5 2 8 3 + 7 0 9 4 4 KHz 50 Hz correct speech fragments Frequency hypothesis: 8 5 3 8 3 0.5 1.0 1.5 2.0 Time (s) Jun 7, 2002 9
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Hi Freq Recruitment Results speech violins drums full apriori 94.8 94.4 94.4 low freq apriori 78.5 79.7 76.2 low freq + fragments 86.9 88.0 85.3 Kind of works... but need to check results are more than just chance. Suggests that Multisource decoder may work if a subset of the fragments can be identified as speech prior to decoding. Jun 7, 2002 10
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Sequential Grouping Modelling of primitive grouping forces that occur between fragments. Fragments May be implemented by adding probabili- ties to decoding paths Hypotheses w1 c.f. bigram/trigram w5 w2 w6 language models. w7 w3 w8 w4 time T1 T2 T3 T4 T5 T6 Need to be careful to ensure we preserve Markov assumption. i.e. given the state the future must be independent of the past. Work in progress... Jun 7, 2002 11
Some Experiments with Multisource Decoding and ‘A priori’ Fragments Summary of Sheffield RESPITE work CTK Missing Data Development of data flow system Scripting language Decoder Tuning GUI Gender Dependency Efficient MD computation Soft Masks Adaptive noise estimates HMM Decoder Code maintainance Use of harmonicity information Multisource Decoder Development Model Combination Evaluations on Aurora Representations e.g. log vs cuberoot compression Multisource Decoder Theoretical Development Experiments with SNR masks Soft Fragments A priori fragments Jun 7, 2002 12
Recommend
More recommend