Microphone Array Processing : A Quick Update Iain McCowan Guillaume Lathoud, Darren Moore, Olivier Masson TAM May 2003 – p. 1/6
Outline • Speech enhancement • Speaker segmentation • Files available online TAM May 2003 – p. 2/6
Speech enhancement • Improving enhancement in overlapping speech by post-filtering beamformer outputs • Beamformer outputs : y n ( f ) for each speaker location n = 1 : N • Post-filter A (Wiener-like - | S | 2 | N | 2 ) • | y n ( f ) | 2 y n ( f ) = ˆ m � = n | y m ( f ) | 2 y n ( f ) (1) 1 � N − 1 • Post-filter B (Binary Mask) • � y n ( f ) n = arg max m y m ( f ) y n ( f ) = ˆ (2) 0 otherwise TAM May 2003 – p. 3/6
Speech enhancement • Subjectively, post-filter B leads to significant reduction in cross-talk level. • To verify, initial recognition experiments • MONC (Multi-channel Overlapping Numbers Corpus - re-recording of Numbers 95). Note : baseline lapel with no conflicting speech is 7.0% WER. • With one overlapping speaker (word error rates) : Lapel Previous Array Best Post-filter B 26.7 19.3 12.2 • With two overlapping speaker : Lapel Previous Array Best Post-filter B 35.3 26.6 15.8 TAM May 2003 – p. 4/6
Speaker Segmentation • Previously, presented work on segmenting using location features. • Since then... • Now doing clustering and segmentation using both location features and standard acoustic features across meetings. • Segment in terms of location and identity (cluster index) concurrently. • Using multi-stream HMM to cluster in each space independently, but enforce same temporal segmentation. • Automatically converges to correct number of locations and identities. • Initial results show high segmentation accuracy ( ≈ 95% frame accuracy). TAM May 2003 – p. 5/6
Files available online • Now appearing on mmm.idiap.ch • Beamformer outputs for Post-filter A and B for each seated speaker location (1-4). (Scripted Meeting set only). • Beamformer-B files have lower noise, though perhaps more distortion than Beamformer-A. • Beamformer outputs for whiteboard and presentation not yet available. • current beamformers are too precise for the typical movement in these regions - investigating minimum beam-width constraint or adaptive techniques. • Beamformer-B mix file available (BeamB-mix) - simple sum of 4 speaker beamformers. • remember, this does not yet cater for white-board or presentation speech. • currently, low level buzz apparent in this mix file... to be fixed. TAM May 2003 – p. 6/6
Recommend
More recommend