Elmar Nöth , Tobias Bocklet, Arnd Gebhard A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis Wednesday, 30 May 2012
Outline Motivation: Long-term goal of the project Patient groups: Parkinson’s disease (PD) patients Stroke patients and patients with facial paresis Speech technology Facial analysis technology Results Summary
Motivation Necessity of Evaluation Diagnosis How intelligible is the patient? (holistic impression) (distinct aspect) How strongly does the patient nasalize? Therapy control Has the situation of the patient improved during therapy? Comparison of therapy methods Which therapy method leads to the best results for a group of patients? Screening Is the quality of a child’s speech according to its age? Computer-assisted therapy Did the patient perform the exercise correctly?
Motivation Necessity of Evaluation Diagnosis How intelligible is the patient? (holistic impression) (distinct aspect) How strongly does the patient nasalize? Therapy control Has the situation of the patient improved during therapy? Comparison of therapy methods Which therapy method leads to the best results for a group of patients? Screening Is the quality of a child’s speech according to its age? Computer-assisted therapy Did the patient perform the exercise correctly?
Motivation Long-term Goal of the Project Provide a telemedical rehabilitation unit for clinical/home use Support speech analysis and analysis of facial gestures and … (gait, cognitive abilities open, flexible platform) Patient groups: Parkinson’s disease (PD) patients Stroke patients and patients with facial paresis Instruct the patient what to do Evaluate the exercises Compare with previous sessions Summarize exercises for therapist
Outline Long-term goal of the project Patient groups: Parkinson’s disease (PD) patients Stroke patients and patients with facial paresis Speech technology Facial analysis technology Results Summary
Patient Groups Parkinson’s Disease Degenerative disorder of the central nervous system Death of dopamine-containing cells in the substantia nigra Cause of cell-death is unknown Second most common neurodegenerative disorder (after Alzheimer's disease) Prevalence ≈ 0.3% (whole population) More common in the elderly: 1% of > 60 years, 4% of > 80 years Incidence of PD ≈ 8 - 18 per 100,000 people Onset in most cases > 50 years, mean onset ≈ 60 years
Patient Groups Speech-related Symptoms of PD Hypophonia (soft speech) Monotonic speech: Speech quality tends to be soft, hoarse, and monotonous Festinating speech: excessively rapid, soft, poorly-intelligible speech Drooling: most likely caused by a weak, infrequent swallow Dysphagia (impaired ability to swallow) Dysarthria
Patient Groups Dysarthria A speech disorder affecting the coordination of muscles in the vocal tract, face, larynx, and respiratory system (dysarthrophonia) Mostly results from a neurological injury, such as a stroke or other kind of brain injury
Patient Groups Dysarthria A speech disorder affecting the coordination of muscles in the vocal tract, face, larynx, and respiratory system (dysarthrophonia) Mostly results from a neurological injury, such as a stroke or other kind of brain injury
Outline Long-term goal of the project Patient groups: Parkinson’s disease (PD) patients Stroke patients and patients with facial paresis Speech technology Facial analysis technology Results Summary
Speech Technology Automatic speech processing methods Word and phoneme recognition Acoustic speaker modeling Prosodic analysis Model of excitation signal Evaluation measures
Speech Technology Word and Phoneme Recognition words with highest probability classi- features word chain fication Off-the-shelf technology Semi-continuous HMMs Easier to adapt with small amounts of data Comparable results with continuous models 11 Mel cepstrum coefficients + energy + 1. derivative
Speech Technology Acoustic Speaker Modeling Idea: Acoustic space of speakers can be modeled Space represents the multidimensional characteristics of voice of a speaker Degree of pathology varies in acoustic space Find characteristics of degree of speech disorder Approach: Acoustics modeled by Gaussian Mixture Models (GMMs) Train Universal Background Model (UBM) with normal speakers Train GMM of path. speakers and transform into vector Perform a classification/regression (depends on the task)
Speech Technology Acoustic Speaker Modeling Gaussian density of UBM feature dimension 2 Gaussian density of speaker model features of healthy speakers features of a path. speaker feature dimension 1 Variations of speakers with different degrees of pathology Can be modeled by adaptation from UBM to GMM
Speech Technology Acoustic Speaker Modeling Concatenation m 6 K 6 of elements of densities m 1 feature dimension 2 m 3 m 2 K 3 m 3 m s = m 2 m 1 m 4 K 2 m 4 K 1 K 4 m 5 m 5 K 1 K 5 m 6 K 2 feature dimension 1 K s = K 3 K 4 Gaussian densities (i = 1,.., N) of K 5 speaker model defined by K 6 mean values( m i ) und covariance matrices ( K i )
Speech Technology Acoustic Speaker Modeling Discriminate between different types of pathology Create SVs of speakers Train some classifier on labeled SVs Create SV of test speaker Classify SV of test speaker points correspond to supervectors (SVs) speakers with pathology type 2 speakers with pathology type 1
Speech Technology Acoustic Speaker Modeling Estimate degree of pathology degree of pathology Train a regression (linear/SVR) Create SV for a test speaker Estimate degree of pathology supervector space
Speech Technology Prosodic Analysis Prosody: rhythm, intonation, stress, and related attributes Computation of prosodic features on word level, across several words or across syllable nuclei or across voiced segments Computation across several words requires ASR Computation across syllable nuclei requires syllable detection Local features: Pauses before/after segments, signal energy, segment duration, and F0 Calculation of mean, max., min., and std. dev. Global features: jitter, shimmer, voiced/unvoiced characteristics ≈ 100-200 features per test utterance
Speech Technology Two-Mass Model of the Vocal Folds
Speech Technology Two-Mass Model of the Vocal Folds
Speech Technology Evaluation Word accuracy (WA) and word correctness (WC) Calculated features Features of acoustic speaker models Features of prosodic analysis Features of 2-mass model Correlation (Pearson & Spearman) based on calculated features or WA, WC with human listener Classification based on calculated features Interpretation of relevant features after feature selection
Outline Long-term goal of the project Patient groups: Parkinson’s disease (PD) patients Stroke patients and patients with facial paresis Speech technology Facial analysis technology Results Summary
Facial Analysis Technology Analysis of Facial Gestures PD: Increasing inability to express emotions with facial gestures (important for communication) Dysarthric speech often accompanied by other physical impairments Facial paresis Motor handicaps Analysis of facial gestures Reduced mobility requires therapist to come to patient High costs Waste of therapist’s time Telemedical therapy
Facial Analysis Technology Anger vs. Joy
Reduced Ability to Vary Facial Expressions with PD Showing Emotions
Dynamic Facial Expressions for Facial Paresis Ability to Analyze Sequence of Movements Unstressed look Lip pursing Closing of eyes Showing the teeth
Facial Analysis Technology Grading of Facial Paresis Different Grading Systems are used Most prominent: Grading System by House&Brackmann [J. House and D. Brackmann: Facial nerve grading system in Otolaryngolocical Head and Neck Surgery, 1985] 6 Grades: House I → healthy person House VI → completely paralyzed half of the patient's face Grading is performed on ( subjective ) observations by expert Problem: Objective tracking of cure processes Solution: Automatic System for diagnosis support
Dynamic Analysis of Facial Gestures 3D Camera: Principle
Dynamic Analysis of Facial Gestures Time-of-Flight (ToF) 3D Camera Up to 50 Hz More than 25k 3D points (176*144 pixels) Eye-safe infrared light / no exposure Precision for facial images 40 cm: +/- 1mm 80 cm: +/- 5mm 120 cm: +/- 15mm
Dynamic Analysis of Facial Gestures Principles of Kinect
Dynamic Analysis of Facial Gestures Principles of Kinect
Dynamic Analysis of Facial Gestures Prototype Illumination Stereo microphones TOF camera Webcam Control image for the patient
Framework Telemedical System
Measuring the Precision
Recommend
More recommend