respite tandem multistream research

RESPITE: Tandem & multistream research Dan Ellis International - PowerPoint PPT Presentation

RESPITE: Tandem & multistream research Dan Ellis International Computer Science Institute, Berkeley CA <> Outline 1 Tandem & LVCSR 2 Mutual information for multistream design 3 Other multistream work at

  1. RESPITE: Tandem & multistream research Dan Ellis International Computer Science Institute, Berkeley CA <> Outline 1 Tandem & LVCSR 2 Mutual information for multistream design 3 Other multistream work at ICSI 4 Other projects: • Meeting recorder • LabROSA • CRAC workshop ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 1

  2. Recent Tandem work 1 Combo over msg: +20% plp Neural net classifier h# C 0 pcl bcl C 1 tcl dcl C 2 C k t n t n+w PCA Gauss mix HTK models decoder Pre-nonlinearity over orthog'n + posteriors: +12% Input sound s ah t msg Neural net Words Combo-into-HTK over KLT over classifier Combo-into-noway: direct: h# C 0 pcl bcl C 1 tcl +15% dcl +8% C 2 C k t n Combo over plp: t n+w +20% Combo over mfcc: NN over HTK: Tandem over HTK: Tandem over hybrid: +25% +15% +35% +25% Tandem combo over HTK mfcc baseline: +53% • Aurora 2000 (mismatched test conditions) - normalization much more important: online? - baseline WER ratio (smaller is better): System Matched test Medium mismatch High mismatch plp, utt-norm 78% 69% 63% tandem, utt-norm 63% 73% 52% tandem, onl-norm 74% 81% 64% (Pratibha Jain, OGI) ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 2

  3. Tandem for LVCSR • DARPA SPINE task (spont. noisy) (e.g.) • Collaboration with OGI & CMU - tandem needs GMM-HMM expertise! • Tight timescale - Tandem system not optimized, one stream • Evaluation submitted, results not yet official - unofficial WERs: MFCC/SPHINX: 35% Tandem/SPHINX: 30.1% full-up CMU (ROVER+MLLR): 26.5% CMU + Tandem (ROVER): 25.7% • Conclusions: - Tandem from CI labels still tractable for LV - improvements may not be so dramatic ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 3

  4. Current Tandem work • Aurora 2000: Cross-language - training Finnish & Italian systems - union of all phone sets? - clustering of cross-language phones • Other targets for neural net training - HMM states - articulatory targets • System variants - ‘mixture of posteriors’ Rottland & Rigoll (2000) HMM decoder s ah t Feature Neural net calculation classifier sub-phone states h# C 0 pcl bcl C 1 tcl dcl C 2 C k t n mixture weights t n+w Phone Input Speech Words probabilities sound features • Transfer to DC ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 4

  5. Mutual Info for multistream design 2 • Combination best for complementary streams • Try to predict by looking at Mutual Information: - low classification MI implies different information • Can also use to choose combination point - feature combination (concatenation) for streams with interdependence ( high feature MI) - else posterior (post-classifier) combination Hx = 7.30, Hy = 6.99, MI = 0.52 Hx = 7.17, Hy = 6.88, MI = 0.03 4 4 3 2 2 MSGa:14 MSGa:2 1 0 0 -1 -2 -2 -3 -4 -2 0 2 -2 0 2 PLPa:2 PLPa:2 ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 5

  6. MI for multistream: results Conditional Mutual Information between feature streams MSGb MSGa PLPb 0.5 0.4 PLPa 0.3 0.2 0.1 0 CMI/bits PLPa PLPb MSGa MSGb Stream 1 Stream 2 Feature CMI Classif CMI FC WER ratio PC WER ratio PLPa PLPb 0.04 0.26 89.6% 97.6% MSGa MSGb 0.21 0.25 85.8% 101.1% PLPa MSGb 0.11 0.15 78.1% 86.3% PLPb MSGa 0.09 0.24 87.5% 89.7% • Low Classif. CMI correlates with good pairs • PC vs. FC more complex than Feature CMI ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 6

  7. Other Multistream work: 3 Multifeature combination (Mike Shire) • LDA design of condition-dependent features: Critical filter Log Band LDA train Power speech Critical RASTA Scale + LPC test Log Band Cepstra Filter Smooth speech Power 0 5 clean 10 LDA filter 1 dB 15 light 20 clean light 25 heavy heavy 30 -0.5 0 0.5 0 1 10 10 seconds Hz • Combine various conditions, test on all: LDA-RASTA-PLP: Combining CLEAN and REVERB Frame Accuracy Word Error Rate 65 70 mild T60 2.5 60 0.25 2.25 60 2 severe 1.75 1.5 55 1.25 50 0.5 Frame Accuracy% 1 50 WER% 0.75 40 0.75 45 1 1.25 0.5 30 1.5 40 severe 1.75 2 2.25 2.5 20 0.25 35 mild T60 30 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 reverb clean reverb clean weighting weighting ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 7

  8. ‘Oracle nets’ for FC multistream (Barry Chen) 4 bands → 15 combinations (+priors): • (smoothed) ‘oracle’ choice halves WER • Can we train a net to make ‘oracle’ choice? - based on KL distance between posteriors? System Word Error Rate best net (4 band) 5.1% phone-smoothed oracle 2.7% KL oracle-net weighted streams 4.9% • Not much help in practice... ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 8

  9. Other projects: Meeting recorder 4 • ASR in conventional meeting environments - for transcription/summarization/retrieval - distant acoustics! - informal, overlapped speech (c/w ShATR) • Data collection: - wired room at ICSI - other systems at UW ... ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 9

  10. Meeting Recorder (cont’d) • Preliminary analysis - transcription & forced alignment (IBM) - ground truth in turns/overlaps - preliminary distant-mic recordings • Research areas - meeting dialog: overlaps, turns etc. - language modeling for meetings - feature design for distant acoustics • Future support - DARPA ‘ROAR’ program? ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 10

  11. LabROSA: The Laboratory for Recognition and Organization of Speech and Audio • New research group at Columbia University in the City of New York - existing EE dept. signal processing group - addition of speech/audio for true multimedia • Research: extracting information from sound - real-world ASR - higher-order: speaker ID, dialog structure - nonspeech: events, acoustic environment ID • Recruiting students ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 11

  12. CRAC2001: “ Consistent and Reliable Acoustic Cues for speech and sound analysis” • RESPITE Contractual Obligation Workshop: - Identifying sources/info (CASA, BSS, SNR est) - Robust ASR (MD, MS, compensation) - Nonspeech, music applications - Psychoacoustics of perception in noise - Combinations • Satellite event at Eurospeech-2001, Aarhus - held on Sunday 2000-09-02 (day before) at Eurospeech location - separate registration • Workshop structure - lecture + posters, am + pm, discussion - limit to ~ 40 participants ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 12

  13. CRAC2001 (cont’d) • Organizing committee - Dan & Martin, co-chairs - Fred, Phil, Andy - Andrzej Drygajlo (EPFL) & H. Okuno (CASA) • Timetable: - CFP: imminent - Abstracts: April 30th, 2001 • Actions: - help with publicity - plan your submission! ICSI: RESPITE progress - Dan Ellis 2000-09-15 - 13


More recommend