nonlinear aspects of speech production modulations and
play

Nonlinear Aspects of Speech Production: Modulations and Energy - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation


  1. Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation Center (Athena RIC) Nonlinear Aspects of Speech Production: Modulations and Energy Operators Petros Maragos Summer School on Speech Signal Processing (S4P) DA-IICT, Gandhinagar, India, 9-11 Sept. 2018 1

  2. Outline  Nonlinear Speech Processing  Modulations  Energy Operators  AM-FM Speech Model, Demodulation Algorithms  Applications to Speech Recognition  Applications to Music Recognition  Application to Audio Summarization  Application to Distant Speech Recognition  Applications of Spatio-Temporal Modulations to Image and Video Processing 2

  3. LINEAR Physics of Linear models ACOUSTICS speech airflow of speech APPROXIMATION production

  4. Physics of Speech Airflow  p • airflow variables: = air density; = pressure  u = 3D air particle velocity • governing equations:          u 0 mass conservation (continuity eqn):  t momentum conservation (Navier-Stokes eqn):       u     1                 2 u u p g u u        t  3  p   1.4 const. state equation: • time-varying boundary conditions

  5. Nonlinear Speech Processing • Modulations • Turbulence – Fractals – Chaos

  6. Evidence for Speech Modulations • separated & unstable airflow • vortices • oscillators with time-varying elements • energy pulses (Teager)

  7. Time-varying Oscillators  AM-FM Simple second-order oscillators with time-varying elements produce modulations: - If mass or compliance are time-varying  FM [Van der Pol, Proc. IRE 1930] - If damping is time-varying  AM [Van der Pol, IEE J. London 1946]

  8. AM-FM Speech Model, Energy Demodulation Algorithms

  9. AM-FM Speech Modulation Model [ Maragos, Kaiser & Quatieri, IEEE T-SP Oct.1993 ] • One Single Resonance as damped AM–FM:   t           t   S ( t ) A ( t ) e cos t q ( ) d        c 0          a ( t )  ( t ) d      Inst.Frequency: ω (t) 2 π f(t) φ (t) ω (t) q(t) c dt • If due to 2 nd -order LTI system  constant,  A(t) ω (t) ω c • Speech Signal as multi-component AM-FM:      Speech ( t ) a ( t ) cos ( t ) k k k

  10. AM-FM Demodulation Problem      Given , estimate a t ( ), ( ) t x t ( ) a t ( ) cos( ( )) t • Variational approach • Hilbert Transform: 1    x (t) x(t) x(t) +j π t ω -j     d x    arctan     2 2 x x a   dt x • Energy Operators

  11. Energy Tracking in Oscillators • harmonic oscillator • energy x(t) K m 1 1 m       2 2 2 2 E m x kx ( A ) constant 2 2 2 • motion equation  kx  m   x 0 • energy tracking • response E      2 2 2    (x) ( x ) x x A ω     ( m 2 ) x ( t ) A cos( t ),   2 k m

  12. 1D Energy Operators (Teager, Kaiser ICASSP 1990) • Continuous-time signals x ( t ) :       2   x ( t ) [ x ( t )] x ( t ) x ( t ) c property:      rt 2 2 rt 2 Ae cos ( ω t θ ) A e ω c c c • Discrete-time signals x(n) : -Discretize Derivatives [Maragos, Kaiser & Quatieri, T-        2 x ( n ) x ( n ) x ( n 1 ) x ( n 1 ) SP Apr.1993] d -Special case of Quadratic opers [Atlas & Fang, T-SP 1995] property:       n 2 2 n 2 A r cos ( Ω n ) A r sin ( Ω )   d c c

  13. Energy Separation Algorithm (ESA) (Maragos, Kaiser & Quatieri, IEEE T-SP Oct. 1993)   x(t) A cos ( ω (t) θ ) • Cosine: c     2 2  2 4 [ x(t) ] A ω [ x (t) ] A ω c c t   x(t) a(t) cos ( ω ( τ )d τ ) • AM-FM signal: 0  a(t), ω (t) do not vary too fast or too much w.r.t. c  [ ( )] x t   [ ( )] x t  a t ( )   ( ) t   [ ( )] x t  [ ( )] x t

  14. Discrete ESA (DESA-2) n    • AM-FM Signal: x n [ ] a n [ ]cos ( (m)dm ) 0 • Energy Tracking:        2 2 x n [ ] a [ ] sin n [ ] n           4 4 x n [ 1] x n [ 1] 4 a [ ] sin n [ ] n • DESA-2:    2 x n [ ]  a n [ ]       x n [ 1] x n [ 1]       x n [ 1] x n [ 1]   arcsin [ ] n    4 x n [ ]

  15. ESA Applied to Synthetic AM-FM 1.25 1 SQRT ENERGY AM--FM SIGNAL 0 0.5 -1.25 0 0 100 200 300 400 0 100 200 300 400 SAMPLE SAMPLE 1.25 0.25 INST. FREQUENCY / PI AMPLITUDE ENVELOPE 1 0.2 0.75 0.15 0 100 200 300 400 0 100 200 300 400 SAMPLE SAMPLE 0.0007 0.006 FREQUENCY ERROR / PI AMPLITUDE ERROR 0 0 -0.0006 -0.007 0 100 200 300 400 0 100 200 300 400 SAMPLE SAMPLE

  16. ESA Applied to Speech Resonance 1 1 SPEECH SIGNAL SQRT ENERGY 0 0.5 -1 0 0 10 20 30 0 10 20 30 TIME (msec) TIME (msec) 200 3 AMPLITUDE ENVELOPE 100 100 SPEECH SPECTRUM (dB) 2 0 -100 1 -200 0 0 10 20 30 -300 0 1 2 3 4 5 6 TIME (msec) FREQUENCY (kHz) 3800 1.1 INST. FREQUENCY (Hz) 3600 BANDPASS SPEECH 3400 0 3200 3000 2800 -1.1 0 10 20 30 0 10 20 30 TIME (msec) TIME (msec)

  17. ESA in Noise and BP Filtering (Bovik, Maragos & Quatieri, IEEE T-SP Dec. 1993) t • AM-FM signal:    x(t) a(t) cos ( ω ( τ )d τ ) n(t)          0 signal • Noise: wss Gaussian zero-mean, p.spectrum N( ξ ) • Bandpass Filter: 2 a (t)  SNR(t) x(t) G(ξ) y(t)    N ( ) d passband   • ESA Ampl./Freq. Estimates: a (t), ω (t)   4 SNR(t) E      2 2 [ ω (t) ] ω (t) 1    2 [ SNR(t) 2 ]      10 SNR(t) 4    2     2 2 E [ a (t) ] a (t) 1 G ω (t)    SNR(t) [ SNR(t) 2 ]  

  18. Multiband Demodulation and F/B Tracking … f f f f 1 2 3 N x ( t , f ) x ( t , f ) x ( t , f ) x ( t , f ) 3 N 1 2 … ESA ESA ESA ESA a ( t , f ) a ( t , f ) a ( t , f ) a ( t , f ) 1 2 3 N f ( t , f ) f ( t , f ) f ( t , f ) f ( t , f ) 1 2 3 N …  f 2   f 2   f 2   f 2  a a a a F(t,f) B(t,f) [ A. Potamianos & P. Maragos, JASA 1996 ]

  19. Frequency and Bandwidth Estimates • Center Frequency Estimates: 2 T f t a 1  ( ) ( ) t dt T o  F  f t dt ( )  Fw o u 2 ( ) T a T  t dt o • Bandwidth Estimates: 1 2 T   2 B  ( f t ( ) F ) dt o u u T   T 2 2 2      ( ( ) / 2 ) a t ( f t ( ) F ) a ( ) t dt o    w  2  Bw 2 ( ) T a t dt  o

  20. Speech Pyknogram [ A. Potamianos & P. Maragos, JASA 1996 ]

  21. Smooth Energy Operators and tracking  Teager-Kaiser Energy Operator (TKEO):  AM-FM signals :  Regularized or Gabor TKEO : where the Gabor filter’s impulse response  Wideband signals (sum of non-stationary sinusoids)  Simultaneous narrowband component separation, energy tracking and denoising  2D Gabor TKEO : Refs: Dimitriadis & Maragos, Speech Com 2006. Kokkinos, Evangelopoulos & Maragos, T-PAMI 2009

  22. 1/f Speech Modulation Model • Model a resonance of a random speech phoneme as a phase-modulated 1/f signal:      S t ( ) A cos t P t ( )  c    ( ) t • Nonlinear phase signal P(t) modeled as 1/f random process . • Useful model for broad resonances often observed in fricative voiced or unvoiced sounds and probably caused by nonlinear phenomena during speech production. [ Dimakis & Maragos, IEEE T-SP 2005 ]

Recommend


More recommend