underdetermined source separation using speaker subspace
play

Underdetermined Source Separation Using Speaker Subspace Models - PowerPoint PPT Presentation

Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4, 2009 Ron Weiss Underdetermined Source


  1. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined Source Separation Using Speaker Subspace Models Thesis Defense Ron Weiss May 4, 2009 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 1 / 34

  2. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Monaural speech separation 3 Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 2 / 34

  3. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Monaural speech separation 3 Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 3 / 34

  4. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Audio source separation Many real world signals contain contributions from multiple sources E.g. cocktail party Want to infer the original sources from the mixture Robust speech recognition Hearing aids Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 4 / 34

  5. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Previous work Instantaneous mixing system       y 1 ( t ) a 11 a 1 I x 1 ( t ) . . . . . . . ... . . . .  =       . . . .      y C ( t ) x I ( t ) a C 1 a CI . . . Simplest case: more channels than sources (overdetermined) Perfect separation possible Use constraints on source signals to guide separation Independence constraints (e.g. independent component analysis) Spatial constraints (e.g. beamforming) Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 5 / 34

  6. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Underdetermined source separation More sources than channels, need stronger constraints CASA: Use perceptual cues similar to human auditory system Segment STFT into short glimpses of each source By harmonicity, common onset, etc. Sequential grouping heuristics Create time-frequency mask for each source Inference based on prior source models Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 6 / 34

  7. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Time-frequency masking Mixture Clean source 8 8 Frequency (kHz) Frequency (kHz) 6 6 4 4 0 2 2 −10 0 0 −20 Masks Reconstructed source (8.2 dB SNR) −30 8 8 −40 Frequency (kHz) Frequency (kHz) 6 6 −50 4 4 2 2 0 0 0.5 1 1.5 2 2.5 3 0.5 1 1.5 2 2.5 3 Time (sec) Time (sec) Natural sounds tend to be sparse in time and frequency 10% of spectrogram cells contain 78% of energy And redundant Still intelligible when 22% of source energy is masked Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 7 / 34

  8. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model-based separation Use constraints from prior source models to guide separation Leverage differences in spectral characteristics of different sources Hidden Markov models, log spectral features Factorial model inference e.g. IBM Iroquois system [Kristjansson et al., 2006] Speaker-dependent models Acoustic dynamics and grammar constraints Superhuman performance under some conditions Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 8 / 34

  9. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model-based separation – Limitations Rely on speaker-dependent models to disambiguate sources What if the task isn’t so well defined? No prior knowledge of speaker identities or grammar Use speaker-independent (SI) model for all sources Need strong temporal constraints or sources will permute “place white by t 4 now” mixed with “lay green with p 9 again” Separated source: “place white by t p 9 again” Solution: adapt speaker-independent model to compensate Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 9 / 34

  10. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Model adaptation Eigenvoices Monaural speech separation 3 Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 10 / 34

  11. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model selection vs. adaptation Speaker models Mean voice Speaker subspace bases Quantization boundaries Model selection (e.g. [Kristjansson et al., 2006] ) Given set of speaker-dependent (SD) models: Identify sources in mixture 1 Use corresponding models for separation 2 How to generalize to speakers outside of training set? Selection – choose closest model Adaptation – interpolate Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 11 / 34

  12. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Model adaptation Feature 2 Adjust model parameters to better match observations Caveats Want to adapt to a single utterance, not Original distribution 1 Observations enough data for MLLR, MAP Adapted distribution Need adaptation framework with few Feature 1 parameters Observations are mixture of multiple sources 2 Iterative separation/adaptation algorithm Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 12 / 34

  13. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Eigenvoice adaptation [Kuhn et al., 2000] Train a set of SD models Pack params into speaker supervector Samples from space of speaker variation Principal component analysis to find orthonormal bases for speaker subspace Model is linear combination of bases Speaker models Speaker subspace bases Other models Eigenvoice adaptation = µ + ¯ + U w B h µ adapted mean eigenvoice weights channel channel model voice bases bases weights Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 13 / 34

  14. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Eigenvoice bases Mean Voice 8 −10 Frequency (kHz) −20 6 −30 4 −40 2 −50 b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Eigenvoice dimension 1 8 8 Frequency (kHz) 6 6 Mean voice 4 4 = speaker-independent model 2 2 0 Eigenvoices shift formant b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Eigenvoice dimension 2 8 frequencies, add pitch 8 Frequency (kHz) 6 6 Independent bases to capture 4 4 2 2 channel variation 0 b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Eigenvoice dimension 3 8 8 Frequency (kHz) 6 6 4 4 2 2 0 b d g p t k jh ch s z f th v dh m n l r w y iy ih eh ey ae aaaway ah aoowuwax Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 14 / 34

  15. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Introduction 1 Speaker subspace model 2 Monaural speech separation 3 Mixed signal model Adaptation algorithm Experiments Binaural separation 4 Conclusions 5 Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 15 / 34

  16. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Eigenvoice factorial HMM Model mixture with combination of source HMMs Need adaptation parameters w i to estimate source signals x i ( t ) and vice versa Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 16 / 34

  17. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Adaptation algorithm Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 17 / 34

  18. Outline Introduction Speaker subspace model Monaural speech separation Binaural separation Conclusions Adaptation example Mixture: t32_swil2a_m18_sbar9n 8 0 6 −20 4 2 −40 0 Adaptation iteration 1 8 0 6 −20 4 2 −40 0 Adaptation iteration 3 Frequency (kHz) 8 0 6 −20 4 2 −40 0 Adaptation iteration 5 8 0 6 −20 4 2 −40 0 SD model separation 8 0 6 −20 4 2 −40 0 0 0.5 1 1.5 Time (sec) Ron Weiss Underdetermined Source Separation Using Speaker Subspace Models May 4, 2009 18 / 34

Recommend


More recommend