morris@idiap.ch http://www.idiap.ch/ Missing-data masks in all-combinations multi-band decoding It is shown that for MAP decoding with all-comb experts • multi-band expert weighting can make use of same soft missing-data mask as used with “missing-data” ASR • experts must be combined during , not before, decoding RESPITE meeting, 25-26 Jan 2002, Page 1
MAP decoder architectures MAP decoding => experts combined during Viterbi All-combination multi-band SMD HMM/MLP MLP FE MLP Decoder hello? MLP FE Separate MLPs estimate state posteriors for every MLP combination of sub-bands Soft MD Mask All-combination multi-band SMD HMM/GMM FE decoder GMM hello? FE Single GMM uses marginal PDFs to estimate state posteriors for every Soft MD Mask combination of sub-bands Usual missing-data ASR = SMD HMM/GMM FE decoder GMM hello? FE Single GMM uses marginal Soft MD Mask PDFs to estimate state densities for each data coefficient RESPITE meeting, 25-26 Jan 2002, Page 2
All-combinations posteriors based decoder can make use of same mask as usual missing-data decoder Notation Q state sequence for one utterance X spectrotemporal signal for one utterance Ω ω f t ( ) ˆ ˆ x f t = P , clean estimated SMD mask, , ( µ g t ) ⇔ = 1 M band g t MD indicator mask, clean , , ( ) P c P Xclean Usual missing-data (GMM) MAP objective Q E P Q X Θ [ ( , ) ] ˆ = arg Q max E P Q X Θ [ ( , ) ] ∝ ( ) p X Q ∫ ( ) p X X obs ( ) P Q d X ( ) P c δ X ( ) U 0 X obs ( , ) = + 1 – p X X obs P c ( ) – X obs Posteriors based (MLP) MAP objective previously tested ( , ) Ω ˆ = arg = 0.5 Q max , P Q X M assumes Q M Posteriors based (MLP) MAP objective using MD mask ( , ) Ω ˆ ˆ = arg Q max , P Q M X makes use of Q M ( ) P Q X M ( , ) = arg max , P M X Q M ∏ ∏ ( ) ≅ ω g t ( ω g t ) 1 – P M X , , , ∈ , ∉ g t M g t M During Viterbi, each frame selects single expert RESPITE meeting, 25-26 Jan 2002, Page 3
Soft missing data mask for posteriors based decoder Per coefficient soft mask, P(x coeff(f,t) missing) Ω ˆ coeffs ω f t , ( ) ˆ x f t = P , clean , 1 … 6 f = 1 … 8 t = ∏ ( ) ≅ ω f t P c = P Xclean , , f t Per band mask, P(x band(g,t) missing) If P(band clean) = P(all components in band are clean), ∏ Ω ˆ band ω g t , ω f t = , , ∈ f g 1 … 2 g = 1 … 8 t = ∏ ∏ ( ) ≅ ω g t ( ω g t ) P M X 1 – , , , ∈ , ∉ g t M g t M RESPITE meeting, 25-26 Jan 2002, Page 4
Ω = 0.5 Test results promissing, even when ( , ) ˆ Q = arg max , P Q X M Q M 100 90 baseline MAP FC SMD 80 1 70 60 2 WER 50 40 3 30 20 10 0 0 10 20 clean SNR/dB Fig shows WER (Aurora, av. over 4 noise conditions) for 1. baseline HMM/GMM 2. HMM/GMM AC multi-stream ( , ) Ω ˆ Q = arg max , P Q X M = 0.5 (assumes ) Q M Stationary band mask. Streams are MFCC with 1st & 2nd differences. 3. Usual missing-data ASR = SMD HMM/GMM SMD RESPITE meeting, 25-26 Jan 2002, Page 5
Recommend
More recommend