a system for speech and 3d facial image acquisition
play

A System for Speech and 3D Facial Image Acquisition, Modeling and - PowerPoint PPT Presentation

Elmar Nth , Tobias Bocklet, Arnd Gebhard A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis Wednesday, 30 May 2012 Outline Motivation: Long-term goal of the project Patient groups: Parkinsons disease


  1. Elmar Nöth , Tobias Bocklet, Arnd Gebhard A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis Wednesday, 30 May 2012

  2. Outline Motivation: Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology Facial analysis technology   Results  Summary

  3. Motivation Necessity of Evaluation  Diagnosis  How intelligible is the patient? (holistic impression) (distinct aspect)  How strongly does the patient nasalize?  Therapy control  Has the situation of the patient improved during therapy?  Comparison of therapy methods  Which therapy method leads to the best results for a group of patients?  Screening  Is the quality of a child’s speech according to its age?  Computer-assisted therapy  Did the patient perform the exercise correctly?

  4. Motivation Necessity of Evaluation  Diagnosis  How intelligible is the patient? (holistic impression) (distinct aspect)  How strongly does the patient nasalize?  Therapy control  Has the situation of the patient improved during therapy?  Comparison of therapy methods  Which therapy method leads to the best results for a group of patients?  Screening  Is the quality of a child’s speech according to its age?  Computer-assisted therapy  Did the patient perform the exercise correctly?

  5. Motivation Long-term Goal of the Project Provide a telemedical rehabilitation unit for clinical/home use   Support speech analysis and analysis of facial gestures and … (gait, cognitive abilities  open, flexible platform) Patient groups:   Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Instruct the patient what to do  Evaluate the exercises Compare with previous sessions   Summarize exercises for therapist

  6. Outline Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology Facial analysis technology   Results  Summary

  7. Patient Groups Parkinson’s Disease  Degenerative disorder of the central nervous system Death of dopamine-containing cells in the substantia nigra   Cause of cell-death is unknown  Second most common neurodegenerative disorder (after Alzheimer's disease)  Prevalence ≈ 0.3% (whole population)  More common in the elderly: 1% of > 60 years, 4% of > 80 years  Incidence of PD ≈ 8 - 18 per 100,000 people  Onset in most cases > 50 years, mean onset ≈ 60 years

  8. Patient Groups Speech-related Symptoms of PD  Hypophonia (soft speech)  Monotonic speech: Speech quality tends to be soft, hoarse, and monotonous  Festinating speech: excessively rapid, soft, poorly-intelligible speech Drooling: most likely caused by a weak, infrequent swallow   Dysphagia (impaired ability to swallow)  Dysarthria

  9. Patient Groups Dysarthria  A speech disorder affecting the coordination of muscles in the vocal tract, face, larynx, and respiratory system (dysarthrophonia)  Mostly results from a neurological injury, such as a stroke or other kind of brain injury

  10. Patient Groups Dysarthria  A speech disorder affecting the coordination of muscles in the vocal tract, face, larynx, and respiratory system (dysarthrophonia)  Mostly results from a neurological injury, such as a stroke or other kind of brain injury

  11. Outline Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology Facial analysis technology   Results  Summary

  12. Speech Technology  Automatic speech processing methods  Word and phoneme recognition  Acoustic speaker modeling  Prosodic analysis  Model of excitation signal  Evaluation measures

  13. Speech Technology Word and Phoneme Recognition words with highest probability classi- features word chain fication  Off-the-shelf technology  Semi-continuous HMMs  Easier to adapt with small amounts of data  Comparable results with continuous models  11 Mel cepstrum coefficients + energy + 1. derivative

  14. Speech Technology Acoustic Speaker Modeling  Idea:  Acoustic space of speakers can be modeled  Space represents the multidimensional characteristics of voice of a speaker  Degree of pathology varies in acoustic space  Find characteristics of degree of speech disorder  Approach:  Acoustics modeled by Gaussian Mixture Models (GMMs)  Train Universal Background Model (UBM) with normal speakers  Train GMM of path. speakers and transform into vector  Perform a classification/regression (depends on the task)

  15. Speech Technology Acoustic Speaker Modeling Gaussian density of UBM feature dimension 2 Gaussian density of speaker model features of healthy speakers features of a path. speaker feature dimension 1  Variations of speakers with different degrees of pathology  Can be modeled by adaptation from UBM to GMM

  16. Speech Technology Acoustic Speaker Modeling Concatenation m 6 K 6 of elements of densities m 1 feature dimension 2 m 3 m 2 K 3 m 3 m s = m 2 m 1 m 4 K 2 m 4 K 1 K 4 m 5 m 5 K 1 K 5 m 6 K 2 feature dimension 1 K s = K 3 K 4 Gaussian densities (i = 1,.., N) of K 5 speaker model defined by K 6 mean values( m i ) und covariance matrices ( K i )

  17. Speech Technology Acoustic Speaker Modeling  Discriminate between different types of pathology  Create SVs of speakers  Train some classifier on labeled SVs  Create SV of test speaker  Classify SV of test speaker points correspond to supervectors (SVs) speakers with pathology type 2 speakers with pathology type 1

  18. Speech Technology Acoustic Speaker Modeling  Estimate degree of pathology degree of pathology Train a regression (linear/SVR) Create SV for a test speaker Estimate degree of pathology supervector space

  19. Speech Technology Prosodic Analysis  Prosody: rhythm, intonation, stress, and related attributes  Computation of prosodic features on word level, across several words or across syllable nuclei or across voiced segments  Computation across several words requires ASR  Computation across syllable nuclei requires syllable detection  Local features:  Pauses before/after segments, signal energy, segment duration, and F0  Calculation of mean, max., min., and std. dev.  Global features: jitter, shimmer, voiced/unvoiced characteristics  ≈ 100-200 features per test utterance

  20. Speech Technology Two-Mass Model of the Vocal Folds

  21. Speech Technology Two-Mass Model of the Vocal Folds

  22. Speech Technology Evaluation  Word accuracy (WA) and word correctness (WC) Calculated features   Features of acoustic speaker models  Features of prosodic analysis  Features of 2-mass model  Correlation (Pearson & Spearman) based on calculated features or WA, WC with human listener  Classification based on calculated features  Interpretation of relevant features after feature selection

  23. Outline Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology  Facial analysis technology  Results  Summary

  24. Facial Analysis Technology Analysis of Facial Gestures  PD: Increasing inability to express emotions with facial gestures (important for communication)  Dysarthric speech often accompanied by other physical impairments  Facial paresis  Motor handicaps   Analysis of facial gestures Reduced mobility requires therapist to come to patient   High costs  Waste of therapist’s time  Telemedical therapy

  25. Facial Analysis Technology Anger vs. Joy

  26. Reduced Ability to Vary Facial Expressions with PD Showing Emotions

  27. Dynamic Facial Expressions for Facial Paresis Ability to Analyze Sequence of Movements Unstressed look Lip pursing Closing of eyes Showing the teeth

  28. Facial Analysis Technology Grading of Facial Paresis  Different Grading Systems are used  Most prominent: Grading System by House&Brackmann [J. House and D. Brackmann: Facial nerve grading system in Otolaryngolocical Head and Neck Surgery, 1985]  6 Grades:  House I → healthy person  House VI → completely paralyzed half of the patient's face  Grading is performed on ( subjective ) observations by expert  Problem: Objective tracking of cure processes  Solution: Automatic System for diagnosis support

  29. Dynamic Analysis of Facial Gestures 3D Camera: Principle

  30. Dynamic Analysis of Facial Gestures Time-of-Flight (ToF) 3D Camera  Up to 50 Hz  More than 25k 3D points (176*144 pixels)  Eye-safe infrared light / no exposure  Precision for facial images  40 cm: +/- 1mm  80 cm: +/- 5mm  120 cm: +/- 15mm

  31. Dynamic Analysis of Facial Gestures Principles of Kinect

  32. Dynamic Analysis of Facial Gestures Principles of Kinect

  33. Dynamic Analysis of Facial Gestures Prototype Illumination Stereo microphones TOF camera Webcam Control image for the patient

  34. Framework Telemedical System

  35. Measuring the Precision

Recommend


More recommend