nonlinear aspects of speech production fractals and
play

Nonlinear Aspects of Speech Production: Fractals and Chaotic - PowerPoint PPT Presentation

Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation


  1. Computer Vision, Speech Communication & Signal Processing Group, Intelligent Robotics and Automation Laboratory National Technical University of Athens, Greece (NTUA) Robot Perception and Interaction Unit, Athena Research and Innovation Center (Athena RIC) Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics Petros Maragos Summer School on Speech Signal Processing (S4P) DA-IICT, Gandhinagar, India, 9-11 Sept. 2018 1

  2. Outline  Nonlinear Speech Processing  Turbulence: Fractals, Chaotic Dynamics  Multiscale Fractal Dimensions of Speech Sounds  Fractal Modulations for Fricative Sounds  Chaotic Dynamics of Speech Sounds  Algorithms for Speech Fractal & Chaos Analysis  Application to Speech Recognition  Application to Music Recognition 2

  3. Linear Source-Filter Model PITCH PERIOD A V VOCAL TRACT GLOTTAL IMPULSE PARAMETERS PULSE TRAIN X MODEL GENERATOR G(z) VOCAL VOICED/UNVOICED RADIATION TRACT SWITCH MODEL MODEL u ( n ) s ( n ) R(z) V(z) RANDOM NOISE X GENERATOR (Rabiner & Schafer, 1978) A N

  4. Nonlinear Fluid Dynamic of the Vocal Tract (Kaiser 1993)

  5. Physics of Speech Airflow  p • airflow variables: = air density; = pressure  u = 3D air particle velocity • governing equations:          u 0 mass conservation (continuity eqn):  t momentum conservation (Navier-Stokes eqn):       u     1                 2 u u p g u u        t  3  p   1.4 const. state equation: • time-varying boundary conditions

  6. Speech Aerodynamics    ( velocity scale ) ( length scale ) • Reynolds number:  Re   • low viscosity μ high Re  inertia forces viscous forces  • “aerodynamic” phenomena (Re >>1): air jet, rotational motion, separated airflow, boundary layers, vortices, turbulence • experimental & theoretical evidence for nonlinear phenomena: Teager (1970s–1980s), Kaiser (1983 – ), Thomas (1986), McGowan (1988), Barney, Shadle & Davis (1999), ...

  7. Vortices   • vorticity:     u   • VORTEX is a flow region of similar • a vortex can be generated by: – velocity gradients in boundary layers – separated air flow – curved geometry of vocal tract • dynamics of vortex propagation:                      2 u u  t      u vorticity twisting & stretching   2   diffusion of vorticity

  8. Nonlinear Speech Processing • Modulations • Turbulence – Fractals – Chaos

  9. Turbulence • flow state with broad-spectrum rapidly-varying (in space and time) velocity and vorticity • transition to turbulence is easier for higher Re flows • eddies: vortices of a characteristic size  • Energy Cascade Theory (Richardson,1922) (multiscale hierarchy of eddies) • 5/3 spectral law (Kolmogorov, 1941):   k   2 3 5 3 S k r , r    k 2 /  wavenumber r  energy dissipation rate   S k r  , velocity wavenumber spectrum

  10. Turbulence, Fractals and Chaos • fractal geometry quantifies multiscale structures in turbulence • Kolmogorov’s 5/3 law       2 3        Var u x u x x x    • we use fractal dimension to quantify “amount” of turbulence in speech  • chaos turbulence

  11. Multiscale Fractal Dimension of Speech Spounds 400 800 3000 SPEECH SIGNAL: / IY / SPEECH SIGNAL: / F / SPEECH SIGNAL: / V / 0 0 0 −400 −800 −3000 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 TIME (millisec) TIME (millisec) TIME (millisec) 2 2 2 1.9 1.9 1.9 FRACTAL DIMENSION of / IY / FRACTAL DIMENSION of / F / FRACTAL DIMENSION of / V / 1.8 1.8 1.8 1.7 1.7 1.7 1.6 1.6 1.6 1.5 1.5 1.5 1.4 1.4 1.4 1.3 1.3 1.3 1.2 1.2 1.2 1.1 1.1 1.1 1 1 1 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 SCALE (millisec) SCALE (millisec) SCALE (millisec) /f/ /v/ /iy/ [ P. Maragos & A. Potamianos, JASA 1999 ]

  12. Speech Attractors /ao/ 1 0.5 /ao/,D E =6, #1846 /iy/,D E =5, #1068 /iy/ 1 X(t) 0 0.5 −0.5 X(t) 0 −1 1 0 500 1000 1500 Time 1 −0.5 −1 0.5 0 200 400 600 800 1000 Time 0.5 0 0 −0.5 −0.5 −1 −1 1 1 0.5 1 1 0.5 0.5 0 0.5 0 0 /k/,D E =6, #816 −0.5 0 /s/,D E =5, #829 −0.5 −0.5 −0.5 −1 −1 −1 −1 1 1 0.5 0.5 0 0 −0.5 −0.5 /k/ 1 −1 /s/ 0.5 1 −1 −1.5 X(t) 1 1 0 0.5 0.5 1 1 0.5 −0.5 X(t) 0 0 0.5 0.5 0 0 −0.5 −1 0 0 200 400 600 800 −0.5 −0.5 Time −0.5 −1 −0.5 −1 −1 −1.5 −1 −1 −1.5 0 200 400 600 800 [ Pitsikalis & Maragos, Speech Commun 2009 ] Time

  13. Multiscale Fractal Dimensions for Speech Sounds Refs: • P. Maragos and A. Potamianos, “ Fractal Dimensions of Speech Sounds: Computation and Application to Automatic Speech Recognition ”, Journal of Acoustical Society of America , March 1999. • P. Maragos, “ Fractal Signal Analysis Using Mathematical Morphology ”, in Advances in Electronics and Electron Physics, vol.88, Academic Press, 1994.

  14. FRACTALS: Definitions • Mandelbrot’s definition  S set is fractal  D ( ) S D ( ) S Hausdorff dim topological dim H T • Examples     S = D 0 D 1 fractal dust T H     S = D 1 D 2 fractal curve T H     S = D 2 D 3 fractal surface T H • Signals    v A function is a fractal if its graph f :  v  1 Gr f ( ) is a fractal set in      f v D D [ Gr f ( )] v 1 is continuous T H 14

  15. ‘F RACTAL ’ D IMENSIONS (OF SETS IN R ν ) D Hausdorff dimension = H D = Minkowski-Bouligand dimension MB D = box counting dimension BC D = similarity dimension S £ £ £ = £ 0 D D D D v T H MB BC £ D D H S 15

  16. Morphological Measurement of Fractal Dimension      • Minkowski cover of curve G : rB z C ( ) r     B  z G   D  • Fractal (Minkowski-Bouligand) dimension 1,2   A r            B  1 D A r area C r  ; length of G r r  B B 2 r • Least-Squares line fit to data          2 log A r r ,log 1 r D   B

  17. Morphological (Flat & Weighted) Filters     ( f g )( ) x max f y ( ) g x ( y ) Dilation (Max-plus convolution): y     Erosion (Min-plus correlation): ( f g )( ) x min f y ( ) g y ( x ) y ORIGINAL SIGNAL EROSION BY FLAT & PARABOLIC SE DILATION BY FLAT & PARABOLIC SE 100 110 110 100 100 50 50 50 PARABOLA PULSE 10 0 0 0 100 200 300 0 Sample Index −10 0 100 200 300 −10 0 100 200 300 0 −10 0 10 Sample Index OPENING BY FLAT & PARABOLIC SE CLOSING BY FLAT & PARABOLIC SE Opening: 110 110 100 100     f g ( f g ) g Closing: 50 50     f g ( f g ) g 0 0 −10 −10 0 100 200 300 0 100 200 300

  18. Minkowski Fractal Dimension of 1D Curve and Morphological Algorithm for 1D Signals 18

  19. ST Speech & Fractal Dimension ZERO−CROSSINGS MS−AMPLITUDE 2 1.8 FRACTAL DIMENSION 1.6 1.4 1.2 1 SPEECH SIGNAL / SOOTHING / 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 TIME (in sec)

  20. Multiscale Speech Fractal Dimension • short-time speech • variable power law signal           2 D area G B C   ,  t  S t 0 T • signal graph • multiscale fractal           2 G t S t , R :0 t T “dimension” (speech fractogram):  • fractal     of    MFD t , D constant power law short-time speech segment          2 D area G B C , as 0 t around time

  21. Multiscale Fractal Dimension of Speech Sp ounds 400 800 3000 SPEECH SIGNAL: / IY / SPEECH SIGNAL: / F / SPEECH SIGNAL: / V / 0 0 0 −400 −800 −3000 0 5 10 15 20 25 30 0 5 10 15 20 25 30 0 5 10 15 20 25 30 TIME (millisec) TIME (millisec) TIME (millisec) 2 2 2 1.9 1.9 1.9 FRACTAL DIMENSION of / IY / FRACTAL DIMENSION of / F / FRACTAL DIMENSION of / V / 1.8 1.8 1.8 1.7 1.7 1.7 1.6 1.6 1.6 1.5 1.5 1.5 1.4 1.4 1.4 1.3 1.3 1.3 1.2 1.2 1.2 1.1 1.1 1.1 1 1 1 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 SCALE (millisec) SCALE (millisec) SCALE (millisec) /f/ /v/ /iy/ [ P. Maragos & A. Potamianos, JASA 1999 ]

Recommend


More recommend