A Classification Approach to Single Channel Source Separation CS - PowerPoint PPT Presentation

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss ronw@ee.columbia.edu A Classi↓cation Approach to Single Channel Source Separation – p. 1/8

Single Channel Source Separation Speech Babble noise Mixture (10 dB SNR) 4000 4000 4000 Frequency (Hz) 3000 3000 3000 = + 2000 2000 2000 1000 1000 1000 0 0 0 0 1 2 3 0 1 2 3 0 1 2 3 Time (seconds) Time (seconds) Time (seconds) • Have a monoaural signal composed of multiple sources • e.g. multiple speakers, speech + music, speech + background noise • Want to separate the constituent sources • For noise robust speech recognition, hearing aids A Classi↓cation Approach to Single Channel Source Separation – p. 2/8

What Data Is Reliable? Mixture Mask − regions where speech energy dominates 4000 4000 Frequency (Hz) Frequency (Hz) 3000 3000 2000 2000 1000 1000 0 0 0 1 2 3 0 1 2 3 Time (seconds) Time (seconds) • Only one source is likely to have a significant amount of energy in any given time/frequency cell • If we can decide which cells are dominated by the source of interest (i.e. has local SNR greater than some threshold), can filter out noise dominated cells ( “refiltering”[5]) A Classi↓cation Approach to Single Channel Source Separation – p. 3/8

Binary Masks As Classification [6] • Goal is to classify each spectrogram cell as being reliable (dominated by speech signal) or not. • Separate classifier for each frequency band • Train on speech mixed with a variety of different noise signals (babble noise, white noise, speech shaped noise, etc...) at a variety of different levels (-5 to 10 dB SNR) • Features: raw spectrogram frames • current frame + previous 5 frames ( ∼ 40 ms) of context A Classi↓cation Approach to Single Channel Source Separation – p. 4/8

The Relevance Vector Machine [7] • Bayesian treatment of the SVM • Huge improvement in sparsity over SVM ( ∼ 50 rvs vs. ∼ 450 svs per classifier on this task) • Does more than just discriminate - gives estimate of posterior probability of class membership • So masks are no longer strictly binary. Can use RVM to estimate the probability that each spectrogram cell is reliable. A Classi↓cation Approach to Single Channel Source Separation – p. 5/8

Missing Feature Signal Reconstruction • What if significant part of the signal is missing? • Want to fill in the blanks in spectrogram of mixed signal • Do MMSE reconstruction on missing dimensions: � x m = E [ x m | z ] = µ k,m P ( k | z ) k • Use signal model of spectrogram frames - GMM with diagonal covariance � P ( k | z ) = P ( k ) P ( z | k ) = P ( k ) P ( z d | k ) d • Just marginalize over missing dimensions to do inference � P ( z d | k ) = P ( r d ) N ( z d | µ k,d , σ k,d ) + (1 − P ( r d )) N ( z d | µ k,d , σ k,d ) dz d A Classi↓cation Approach to Single Channel Source Separation – p. 6/8

Example speech + factory2 noise − 0.88695 dB SNR clean speech signal 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 0 0.5 1 1.5 RVM mask A priori mask 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 0 0.5 1 1.5 Refiltering using RVM mask − 7.7788 dB SNR GMM reconstruction − 8.4013 dB SNR 4000 4000 3000 3000 2000 2000 1000 1000 0 0 0 0.5 1 1.5 0 0.5 1 1.5 A Classi↓cation Approach to Single Channel Source Separation – p. 7/8

References [1] J. Barker, P. Green, and M. Cooke. Linking auditory scene analysis and robust asr by missing data techniques. In WISP , pages 295–307, April 2001. [2] M. P. Cooke, P. Green, L. B. Josifovski, and A. Vizinho. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication , 34:267–285, May 2001. [3] B. Raj, M. L. Seltzer, and R. M. Stern. Reconstruction of missing features for robust speech recognition. Speech Communication , 43:275–296, 2004. [4] A. M. Reddy and B. M. Raj. Soft mask estimation for single channel source separation. In SAPA , 2004. [5] S. T. Roweis. Factorial models and refiltering for speech separation and denoising. In Proceedings of EuroSpeech , 2003. [6] M. L. Seltzer, B. Raj, and R. M. Stern. Classifier-based mask estimation for missing feature methods of robust speech recognition. In Proceedings of ICSLP , 2000. [7] M. Tipping. The relevance vector machine. In S. A. Solla, T. K. Leen, and K.-R. Muller, editors, Advances in Neural Information Processing Systems 12 , pages 652–658. MIT Press, 2000. A Classi↓cation Approach to Single Channel Source Separation – p. 8/8

A Classification Approach to Single Channel Source Separation CS - PowerPoint PPT Presentation

A Classification Approach to Single Channel Source Separation CS 6772 Project Ron Weiss ronw@ee.columbia.edu A Classication Approach to Single Channel Source Separation p. 1/8 Single Channel Source Separation Speech Babble noise

CHANNEL ALLOCATION Channel Language Translation Channel Translation Language Channel 1 German

ANNUAL ACCOUNTS PRESS CONFERENCE CHANNEL ALLOCATION. Channel Language Translation Channel

Channel Assignment and Channel Hopping in IEEE 802.11 Operating Channels for 802.11b Europe

ANNUAL ACCOUNTS PRESS CONFERENCE LANGUAGE CHANNELS. Channel Language Channel (translation)

Channel design Channel coverage Intensive Selective Exclusive Channel

1 Simultaneous interpretation EN channel 1 FR channel 2 ES channel 3 DE channel 4 2 The Future

Single Top s -channel Production in / Single Top s -channel Production in / E T +jets at CDF E T

and Retrieval Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H. Jegou Source: H.

Formal Modeling in Cognitive Science 1 Noisy Channel Model Channel Capacity Lecture 29: Noisy

Common Features of Flow Networks Single Source Single Sink Flows Simple setting: single

Delay limited Joint Source-Channel Coding Morteza Varasteh Imperial College London (ICL) M.

= k w ( p ) w v ( , v ) i 1 i Single Single- -destination shortest

Single-Source Shortest Paths Introduction Negative Weights and Cycles Initialize-Single-Source

Single-Source Architecture Principles Single-Source Architecture is strategy for building websites

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Dual Enrollment Credentialing Options Online or Hybrid Delivery ENGLISH MATH BIOLOGY HISTORY

Virtual Localization for Mesh Network Routing Nick Moore / Ahmet S ekercio glu / Gregory K

Ionic Compounds and Ionic Bonding Slide 3 / 130 Slide 4 / 130 Table of Contents: Ionic

Nonification v. 1.0.1 Raymond Aschheim raymond@quantumgravityresearch.org v 1.0.1 presented at

ADJUSTED OPERATING PROFIT CONTINUED TO INCREASE Half-year Financial Review January-June 2018 24

English Learner Advisory Committee Third Meeting Chaparral High School January 28, 2020 1

IFS INTERNATIONAL FERTILISER SOCIETY TECHNICAL CONFERENCE 2015 London 22 - 23 June 2015 40

The Presentation of Cultural Heritage Models in Epoch Sven Havemann 1 , Volker Settgast 1 , Dieter