SPSC - Microphone Array Processing for Distant Speech Recognition Microphone Array Processing for Distant Speech Recognition From close-talking microphones to far-field sensors Hannes Pessentheiner Signal Processing and Speech Communication Laboratory Advanced Signal Processing 2 Hannes Pessentheiner Advanced Signal Processing 2 page 1/32
SPSC - Microphone Array Processing for Distant Speech Recognition Distant Speech-Interaction in Robust Home Applications ◮ people that would like to have assistance in everyday life ◮ physically handicapped people Problem: want to live independently - lack of sphere of privacy - depend on other people Solution: ambient assisted living (AAL) - operated by device - operated by voice command AAL-scenario of handicapped woman. What is a main challenge in voice command? Hannes Pessentheiner Advanced Signal Processing 2 page 2/32
SPSC - Microphone Array Processing for Distant Speech Recognition Distant Speech Recognition (DSR) ◮ most natural human computer interface - interaction through speech - no use of body- or head-mounted microphones Block-diagram of a simple DSR system. How does speech capturing work? Hannes Pessentheiner Advanced Signal Processing 2 page 3/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Capturing ◮ free field / diffuse field ◮ spherical / planar wave propagation Propagation of spherical (left) and plane (right) wave. What to do with multi-channel data? Hannes Pessentheiner Advanced Signal Processing 2 page 4/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Source localization & tracking ◮ estimate speaker’s position / direction for each instant of time ◮ compute trajectory of instantaneous position estimates Sound capturing (left), speaker localization (center), and speaker tracking (right). How to employ directional information? Hannes Pessentheiner Advanced Signal Processing 2 page 5/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming How to improve pre-enhanced signal? Hannes Pessentheiner Advanced Signal Processing 2 page 6/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Postfiltering Beamformer-related Postfilter (ao) Generalized Sidelobe Canceller (GSC): Filter&Sum-Beamformer w H with parallel filter. q Autonomous Postfilter (ao) ◮ single-channel source separation filter ◮ echo & noise attenuation filter ◮ spectral subtraction What about errors in DSR systems? Hannes Pessentheiner Advanced Signal Processing 2 page 7/32
SPSC - Microphone Array Processing for Distant Speech Recognition Errors in DSR Major Errors ◮ front-end errors ◮ corrupted training material for single-channel source separation or speech recognizer Minor Errors (ao) Representative front-end errors. ◮ distorted features ◮ numerical accuracy (single/double precision) How to measure performance of DSR system? Hannes Pessentheiner Advanced Signal Processing 2 page 8/32
SPSC - Microphone Array Processing for Distant Speech Recognition Metrics Word Error Rate (WER) WER = S + D + I S + D + C where I . . . # of insertions S . . . # of substitutions D . . . # of deletions C . . . # of of corrects Word Accuracy Rate (WACC) WACC = 1 − WER Hannes Pessentheiner Advanced Signal Processing 2 page 9/32
SPSC - Microphone Array Processing for Distant Speech Recognition Metrics cont’d Real-valued Kurtosis (peakedness) K ( X ) = E {| X | 4 } − β · E {| X | 2 } 2 where X . . . random variable (RV) E . . . expectation operator β . . . positive constant K > 1 : super-Gaussian probability density function (PDF) K = 0 : Gaussian PDF K < 0 : sub-Gaussian PDF Hannes Pessentheiner Advanced Signal Processing 2 page 10/32
SPSC - Microphone Array Processing for Distant Speech Recognition Metrics cont’d Negentropy (Gaussian-distance) N ( X ) = H ( X Gaussian ) − H ( X ) with differential entropy � H ( X ) = − p X ( x ) log p X ( x ) dx = − E { log p X ( x ) } N = 0 : Gaussian PDF N > 0 : non-Gaussian PDF Why to consider kurtosis & negentropy? Hannes Pessentheiner Advanced Signal Processing 2 page 11/32
SPSC - Microphone Array Processing for Distant Speech Recognition Distribution of Speech Samples Histograms of real parts of sub-band frequency components ( f = 800 Hz) of (a) clean speech, (b) noise-corrupted speech, and (c) reverberated speech snapshots. ◮ PDF of sum of independent RVs approach Gaussian in limit - mix of speech, reverb, & noise exhibits Gaussian PDF - clean speech exhibits super-Gaussian PDF - use N and K to restore super-Gaussianity How to restore super-Gaussianity? Hannes Pessentheiner Advanced Signal Processing 2 page 12/32
SPSC - Microphone Array Processing for Distant Speech Recognition Conventional Beamforming Assumptions ◮ signal s ( ω ) exhibits plane wave characteristic ◮ microphone i captures noise-corrupted signal x i ( ω ) x ( ω ) = s ( ω ) d ( ω, k ) + v ( ω ) where ω . . . radial frequency k . . . wave-frequency vector d . . . array manifold / sound capture model vector x . . . snapshot vector Hannes Pessentheiner Advanced Signal Processing 2 page 13/32
SPSC - Microphone Array Processing for Distant Speech Recognition Conventional Beamforming cont’d Beamforming ◮ linear spatio-temporal filter ◮ compensate d ( ω, k ) for steering direction y ( ω ) = w H ( ω ) x ( ω ) with w H ( ω ) d ( ω, k ) = 1 Hannes Pessentheiner Advanced Signal Processing 2 page 14/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d 3D directivity pattern of two different beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 15/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Delay&Sum (DS) w ( ω ) = d ( ω, k ) N Minimum Variance Distortionless Response (MVDR) w H ( ω ) R NN ( ω ) w ( ω ) arg min w ( ω ) d H ( ω, k ) R − 1 NN ( ω ) w H ( ω ) = d H ( ω, k ) R − 1 NN ( ω ) d ( ω, k ) Hannes Pessentheiner Advanced Signal Processing 2 page 16/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d MVDR with Diagonal Loading ◮ consider quadratic constraint: 0 < | w | 2 < γ ◮ optimization replaces R − 1 NN by R − 1 NN ( ω ) + σ 2 I where σ . . . loading level I . . . identity matrix Super-directive MVDR � � ω · l m,n ◮ replace R NN by Γ m,n = sinc c where l m,n . . . distance between microphones m and n c . . . sound velocity Hannes Pessentheiner Advanced Signal Processing 2 page 17/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d One-dimensional directivity pattern of DS and MVDR. Hannes Pessentheiner Advanced Signal Processing 2 page 18/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d 3D Convex-optimized (CVX) ◮ more constraints & optimization in 3 spatial dimensions w ( ω ) � G ( ω ) · [ w ( ω ) ⊗ I ] − ˆ arg min D � F subject to | w T ( ω ) d ( ω ) | 2 w H ( ω ) d ( ω ) = 1 w H ( ω ) V ( ω ) = 0 w H ( ω ) w ( ω ) ≥ γ , , , � �� � � �� � � �� � distortionless response null steering white noise gain where G . . . 3D capturing response matrix ˆ D . . . 3D desired response matrix V . . . 3D null steering matrix Hannes Pessentheiner Advanced Signal Processing 2 page 19/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Two-dimensional CVX directivity pattern with synthesized null and frequency-invariance. Hannes Pessentheiner Advanced Signal Processing 2 page 20/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Generalized Sidelobe Canceller (GSC) ◮ combine beamformer and postfilter Block-diagram of a GSC beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 21/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Subspace Maximum Kurtosis (MK) / Negentropy (MN) ◮ based on GSC, but with subspace filter matrix U that - reduces dimensionality and - decomposes signal into spatially correlated and ambient comp. ◮ use kurtosis or negentropy to detect (sub-/super-)Gaussianity Block-diagram of a MK/MN beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 22/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Modes (Eigenvalues) of spatially correlated and ambient components. Hannes Pessentheiner Advanced Signal Processing 2 page 23/32
SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Super-directive MVDR based on Spherical Harmonics ◮ based on 3D microphone array ◮ frequency-invariant directivity pattern ◮ directivity stability around spherical array ◮ redefine array manifold / sound capture model vector d ◮ all beamforming techniques can be applied Eigenmike: spherical microphone array. Hannes Pessentheiner Advanced Signal Processing 2 page 24/32
Recommend
More recommend