Microphone Array Processing for Distant Speech Recognition From - PowerPoint PPT Presentation

SPSC - Microphone Array Processing for Distant Speech Recognition Microphone Array Processing for Distant Speech Recognition From close-talking microphones to far-field sensors Hannes Pessentheiner Signal Processing and Speech Communication Laboratory Advanced Signal Processing 2 Hannes Pessentheiner Advanced Signal Processing 2 page 1/32

SPSC - Microphone Array Processing for Distant Speech Recognition Distant Speech-Interaction in Robust Home Applications ◮ people that would like to have assistance in everyday life ◮ physically handicapped people Problem: want to live independently - lack of sphere of privacy - depend on other people Solution: ambient assisted living (AAL) - operated by device - operated by voice command AAL-scenario of handicapped woman. What is a main challenge in voice command? Hannes Pessentheiner Advanced Signal Processing 2 page 2/32

SPSC - Microphone Array Processing for Distant Speech Recognition Distant Speech Recognition (DSR) ◮ most natural human computer interface - interaction through speech - no use of body- or head-mounted microphones Block-diagram of a simple DSR system. How does speech capturing work? Hannes Pessentheiner Advanced Signal Processing 2 page 3/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Capturing ◮ free field / diffuse field ◮ spherical / planar wave propagation Propagation of spherical (left) and plane (right) wave. What to do with multi-channel data? Hannes Pessentheiner Advanced Signal Processing 2 page 4/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Source localization & tracking ◮ estimate speaker’s position / direction for each instant of time ◮ compute trajectory of instantaneous position estimates Sound capturing (left), speaker localization (center), and speaker tracking (right). How to employ directional information? Hannes Pessentheiner Advanced Signal Processing 2 page 5/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming How to improve pre-enhanced signal? Hannes Pessentheiner Advanced Signal Processing 2 page 6/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Postfiltering Beamformer-related Postfilter (ao) Generalized Sidelobe Canceller (GSC): Filter&Sum-Beamformer w H with parallel filter. q Autonomous Postfilter (ao) ◮ single-channel source separation filter ◮ echo & noise attenuation filter ◮ spectral subtraction What about errors in DSR systems? Hannes Pessentheiner Advanced Signal Processing 2 page 7/32

SPSC - Microphone Array Processing for Distant Speech Recognition Errors in DSR Major Errors ◮ front-end errors ◮ corrupted training material for single-channel source separation or speech recognizer Minor Errors (ao) Representative front-end errors. ◮ distorted features ◮ numerical accuracy (single/double precision) How to measure performance of DSR system? Hannes Pessentheiner Advanced Signal Processing 2 page 8/32

SPSC - Microphone Array Processing for Distant Speech Recognition Metrics Word Error Rate (WER) WER = S + D + I S + D + C where I . . . # of insertions S . . . # of substitutions D . . . # of deletions C . . . # of of corrects Word Accuracy Rate (WACC) WACC = 1 − WER Hannes Pessentheiner Advanced Signal Processing 2 page 9/32

SPSC - Microphone Array Processing for Distant Speech Recognition Metrics cont’d Real-valued Kurtosis (peakedness) K ( X ) = E {| X | 4 } − β · E {| X | 2 } 2 where X . . . random variable (RV) E . . . expectation operator β . . . positive constant K > 1 : super-Gaussian probability density function (PDF) K = 0 : Gaussian PDF K < 0 : sub-Gaussian PDF Hannes Pessentheiner Advanced Signal Processing 2 page 10/32

SPSC - Microphone Array Processing for Distant Speech Recognition Metrics cont’d Negentropy (Gaussian-distance) N ( X ) = H ( X Gaussian ) − H ( X ) with differential entropy � H ( X ) = − p X ( x ) log p X ( x ) dx = − E { log p X ( x ) } N = 0 : Gaussian PDF N > 0 : non-Gaussian PDF Why to consider kurtosis & negentropy? Hannes Pessentheiner Advanced Signal Processing 2 page 11/32

SPSC - Microphone Array Processing for Distant Speech Recognition Distribution of Speech Samples Histograms of real parts of sub-band frequency components ( f = 800 Hz) of (a) clean speech, (b) noise-corrupted speech, and (c) reverberated speech snapshots. ◮ PDF of sum of independent RVs approach Gaussian in limit - mix of speech, reverb, & noise exhibits Gaussian PDF - clean speech exhibits super-Gaussian PDF - use N and K to restore super-Gaussianity How to restore super-Gaussianity? Hannes Pessentheiner Advanced Signal Processing 2 page 12/32

SPSC - Microphone Array Processing for Distant Speech Recognition Conventional Beamforming Assumptions ◮ signal s ( ω ) exhibits plane wave characteristic ◮ microphone i captures noise-corrupted signal x i ( ω ) x ( ω ) = s ( ω ) d ( ω, k ) + v ( ω ) where ω . . . radial frequency k . . . wave-frequency vector d . . . array manifold / sound capture model vector x . . . snapshot vector Hannes Pessentheiner Advanced Signal Processing 2 page 13/32

SPSC - Microphone Array Processing for Distant Speech Recognition Conventional Beamforming cont’d Beamforming ◮ linear spatio-temporal filter ◮ compensate d ( ω, k ) for steering direction y ( ω ) = w H ( ω ) x ( ω ) with w H ( ω ) d ( ω, k ) = 1 Hannes Pessentheiner Advanced Signal Processing 2 page 14/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d 3D directivity pattern of two different beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 15/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Delay&Sum (DS) w ( ω ) = d ( ω, k ) N Minimum Variance Distortionless Response (MVDR) w H ( ω ) R NN ( ω ) w ( ω ) arg min w ( ω ) d H ( ω, k ) R − 1 NN ( ω ) w H ( ω ) = d H ( ω, k ) R − 1 NN ( ω ) d ( ω, k ) Hannes Pessentheiner Advanced Signal Processing 2 page 16/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d MVDR with Diagonal Loading ◮ consider quadratic constraint: 0 < | w | 2 < γ ◮ optimization replaces R − 1 NN by R − 1 NN ( ω ) + σ 2 I where σ . . . loading level I . . . identity matrix Super-directive MVDR � � ω · l m,n ◮ replace R NN by Γ m,n = sinc c where l m,n . . . distance between microphones m and n c . . . sound velocity Hannes Pessentheiner Advanced Signal Processing 2 page 17/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d One-dimensional directivity pattern of DS and MVDR. Hannes Pessentheiner Advanced Signal Processing 2 page 18/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d 3D Convex-optimized (CVX) ◮ more constraints & optimization in 3 spatial dimensions w ( ω ) � G ( ω ) · [ w ( ω ) ⊗ I ] − ˆ arg min D � F subject to | w T ( ω ) d ( ω ) | 2 w H ( ω ) d ( ω ) = 1 w H ( ω ) V ( ω ) = 0 w H ( ω ) w ( ω ) ≥ γ , , , � �� distortionless response null steering white noise gain where G . . . 3D capturing response matrix ˆ D . . . 3D desired response matrix V . . . 3D null steering matrix Hannes Pessentheiner Advanced Signal Processing 2 page 19/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Two-dimensional CVX directivity pattern with synthesized null and frequency-invariance. Hannes Pessentheiner Advanced Signal Processing 2 page 20/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Generalized Sidelobe Canceller (GSC) ◮ combine beamformer and postfilter Block-diagram of a GSC beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 21/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Subspace Maximum Kurtosis (MK) / Negentropy (MN) ◮ based on GSC, but with subspace filter matrix U that - reduces dimensionality and - decomposes signal into spatially correlated and ambient comp. ◮ use kurtosis or negentropy to detect (sub-/super-)Gaussianity Block-diagram of a MK/MN beamformer. Hannes Pessentheiner Advanced Signal Processing 2 page 22/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Modes (Eigenvalues) of spatially correlated and ambient components. Hannes Pessentheiner Advanced Signal Processing 2 page 23/32

SPSC - Microphone Array Processing for Distant Speech Recognition Speech Enhancement: Beamforming cont’d Super-directive MVDR based on Spherical Harmonics ◮ based on 3D microphone array ◮ frequency-invariant directivity pattern ◮ directivity stability around spherical array ◮ redefine array manifold / sound capture model vector d ◮ all beamforming techniques can be applied Eigenmike: spherical microphone array. Hannes Pessentheiner Advanced Signal Processing 2 page 24/32

Microphone Array Processing for Distant Speech Recognition From - PowerPoint PPT Presentation

SPSC - Microphone Array Processing for Distant Speech Recognition Microphone Array Processing for Distant Speech Recognition From close-talking microphones to far-field sensors Hannes Pessentheiner Signal Processing and Speech Communication

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

A synthetic aperture microphone array for the meeting room TNO TPD Synthetic aperture microphone

Neural Networks for Distant Speech Recognition Steve Renals ! Joint work with ! Centre for Speech

Presentation Notes for DVD Talk Timeline 1958 Laserdisc technology, using a transparent

Jointly Detecting and Separating Singing Voice: A Multi-Task Approach Daniel Stoller 1 , Sebastian

This presentation is intended as a quick start guide to using the News feature of Wires-X. The

Wavelet-domain convolution for audio localization Paul Hubbard phubbard@anl.gov Joint work

New lamps for old: the magical Aladdin approach to our Special Collections Eleanor Johnston &

ERIS LIFESCIENCES LTD Q4 19 and FY 19 INVESTOR PRESENTATION 1 SAFE HARBOR STATEMENT This

ZIYEN, Inc. Target Price for IPO: $3.30 PO BOX 1500 BONITA, CA 91908 Industry e U.S. Domestic

CFP Board Proposed Standards: Too Weak, Too Strong, or Just Right? WealthManagement.com Tuesday,

Microphone Array Processing for Distant Speech Recognition From - PowerPoint PPT Presentation

SPSC - Microphone Array Processing for Distant Speech Recognition Microphone Array Processing for Distant Speech Recognition From close-talking microphones to far-field sensors Hannes Pessentheiner Signal Processing and Speech Communication

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Speech Processing 15-492/18-492 Speech Recognition Signal Processing Analog to Digital Speech

Speech Processing 15-492/18-492 Speech Recognition Template matching Speech Recognition by

singly linked lists Sept. 18, 2017 1 Recall last lecture: Java array array array array of

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Grammars

A synthetic aperture microphone array for the meeting room TNO TPD Synthetic aperture microphone

Neural Networks for Distant Speech Recognition Steve Renals ! Joint work with ! Centre for Speech

Presentation Notes for DVD Talk Timeline 1958 Laserdisc technology, using a transparent

Jointly Detecting and Separating Singing Voice: A Multi-Task Approach Daniel Stoller 1 , Sebastian

This presentation is intended as a quick start guide to using the News feature of Wires-X. The

Wavelet-domain convolution for audio localization Paul Hubbard phubbard@anl.gov Joint work

New lamps for old: the magical Aladdin approach to our Special Collections Eleanor Johnston &amp;

ERIS LIFESCIENCES LTD Q4 19 and FY 19 INVESTOR PRESENTATION 1 SAFE HARBOR STATEMENT This

ZIYEN, Inc. Target Price for IPO: $3.30 PO BOX 1500 BONITA, CA 91908 Industry e U.S. Domestic

CFP Board Proposed Standards: Too Weak, Too Strong, or Just Right? WealthManagement.com Tuesday,

New lamps for old: the magical Aladdin approach to our Special Collections Eleanor Johnston &