A System for Speech and 3D Facial Image Acquisition, Modeling and - PowerPoint PPT Presentation

Elmar Nöth , Tobias Bocklet, Arnd Gebhard A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis Wednesday, 30 May 2012

Outline Motivation: Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology Facial analysis technology   Results  Summary

Motivation Necessity of Evaluation  Diagnosis  How intelligible is the patient? (holistic impression) (distinct aspect)  How strongly does the patient nasalize?  Therapy control  Has the situation of the patient improved during therapy?  Comparison of therapy methods  Which therapy method leads to the best results for a group of patients?  Screening  Is the quality of a child’s speech according to its age?  Computer-assisted therapy  Did the patient perform the exercise correctly?

Motivation Long-term Goal of the Project Provide a telemedical rehabilitation unit for clinical/home use   Support speech analysis and analysis of facial gestures and … (gait, cognitive abilities  open, flexible platform) Patient groups:   Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Instruct the patient what to do  Evaluate the exercises Compare with previous sessions   Summarize exercises for therapist

Outline Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology Facial analysis technology   Results  Summary

Patient Groups Parkinson’s Disease  Degenerative disorder of the central nervous system Death of dopamine-containing cells in the substantia nigra   Cause of cell-death is unknown  Second most common neurodegenerative disorder (after Alzheimer's disease)  Prevalence ≈ 0.3% (whole population)  More common in the elderly: 1% of > 60 years, 4% of > 80 years  Incidence of PD ≈ 8 - 18 per 100,000 people  Onset in most cases > 50 years, mean onset ≈ 60 years

Patient Groups Speech-related Symptoms of PD  Hypophonia (soft speech)  Monotonic speech: Speech quality tends to be soft, hoarse, and monotonous  Festinating speech: excessively rapid, soft, poorly-intelligible speech Drooling: most likely caused by a weak, infrequent swallow   Dysphagia (impaired ability to swallow)  Dysarthria

Patient Groups Dysarthria  A speech disorder affecting the coordination of muscles in the vocal tract, face, larynx, and respiratory system (dysarthrophonia)  Mostly results from a neurological injury, such as a stroke or other kind of brain injury

Outline Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology Facial analysis technology   Results  Summary

Speech Technology  Automatic speech processing methods  Word and phoneme recognition  Acoustic speaker modeling  Prosodic analysis  Model of excitation signal  Evaluation measures

Speech Technology Word and Phoneme Recognition words with highest probability classi- features word chain fication  Off-the-shelf technology  Semi-continuous HMMs  Easier to adapt with small amounts of data  Comparable results with continuous models  11 Mel cepstrum coefficients + energy + 1. derivative

Speech Technology Acoustic Speaker Modeling  Idea:  Acoustic space of speakers can be modeled  Space represents the multidimensional characteristics of voice of a speaker  Degree of pathology varies in acoustic space  Find characteristics of degree of speech disorder  Approach:  Acoustics modeled by Gaussian Mixture Models (GMMs)  Train Universal Background Model (UBM) with normal speakers  Train GMM of path. speakers and transform into vector  Perform a classification/regression (depends on the task)

Speech Technology Acoustic Speaker Modeling Gaussian density of UBM feature dimension 2 Gaussian density of speaker model features of healthy speakers features of a path. speaker feature dimension 1  Variations of speakers with different degrees of pathology  Can be modeled by adaptation from UBM to GMM

Speech Technology Acoustic Speaker Modeling Concatenation m 6 K 6 of elements of densities m 1 feature dimension 2 m 3 m 2 K 3 m 3 m s = m 2 m 1 m 4 K 2 m 4 K 1 K 4 m 5 m 5 K 1 K 5 m 6 K 2 feature dimension 1 K s = K 3 K 4 Gaussian densities (i = 1,.., N) of K 5 speaker model defined by K 6 mean values( m i ) und covariance matrices ( K i )

Speech Technology Acoustic Speaker Modeling  Discriminate between different types of pathology  Create SVs of speakers  Train some classifier on labeled SVs  Create SV of test speaker  Classify SV of test speaker points correspond to supervectors (SVs) speakers with pathology type 2 speakers with pathology type 1

Speech Technology Acoustic Speaker Modeling  Estimate degree of pathology degree of pathology Train a regression (linear/SVR) Create SV for a test speaker Estimate degree of pathology supervector space

Speech Technology Prosodic Analysis  Prosody: rhythm, intonation, stress, and related attributes  Computation of prosodic features on word level, across several words or across syllable nuclei or across voiced segments  Computation across several words requires ASR  Computation across syllable nuclei requires syllable detection  Local features:  Pauses before/after segments, signal energy, segment duration, and F0  Calculation of mean, max., min., and std. dev.  Global features: jitter, shimmer, voiced/unvoiced characteristics  ≈ 100-200 features per test utterance

Speech Technology Two-Mass Model of the Vocal Folds

Speech Technology Evaluation  Word accuracy (WA) and word correctness (WC) Calculated features   Features of acoustic speaker models  Features of prosodic analysis  Features of 2-mass model  Correlation (Pearson & Spearman) based on calculated features or WA, WC with human listener  Classification based on calculated features  Interpretation of relevant features after feature selection

Outline Long-term goal of the project   Patient groups:  Parkinson’s disease (PD) patients  Stroke patients and patients with facial paresis  Speech technology  Facial analysis technology  Results  Summary

Facial Analysis Technology Analysis of Facial Gestures  PD: Increasing inability to express emotions with facial gestures (important for communication)  Dysarthric speech often accompanied by other physical impairments  Facial paresis  Motor handicaps   Analysis of facial gestures Reduced mobility requires therapist to come to patient   High costs  Waste of therapist’s time  Telemedical therapy

Facial Analysis Technology Anger vs. Joy

Reduced Ability to Vary Facial Expressions with PD Showing Emotions

Dynamic Facial Expressions for Facial Paresis Ability to Analyze Sequence of Movements Unstressed look Lip pursing Closing of eyes Showing the teeth

Facial Analysis Technology Grading of Facial Paresis  Different Grading Systems are used  Most prominent: Grading System by House&Brackmann [J. House and D. Brackmann: Facial nerve grading system in Otolaryngolocical Head and Neck Surgery, 1985]  6 Grades:  House I → healthy person  House VI → completely paralyzed half of the patient's face  Grading is performed on ( subjective ) observations by expert  Problem: Objective tracking of cure processes  Solution: Automatic System for diagnosis support

Dynamic Analysis of Facial Gestures 3D Camera: Principle

Dynamic Analysis of Facial Gestures Time-of-Flight (ToF) 3D Camera  Up to 50 Hz  More than 25k 3D points (176*144 pixels)  Eye-safe infrared light / no exposure  Precision for facial images  40 cm: +/- 1mm  80 cm: +/- 5mm  120 cm: +/- 15mm

Dynamic Analysis of Facial Gestures Principles of Kinect

Dynamic Analysis of Facial Gestures Prototype Illumination Stereo microphones TOF camera Webcam Control image for the patient

Framework Telemedical System

Measuring the Precision

A System for Speech and 3D Facial Image Acquisition, Modeling and - PowerPoint PPT Presentation

Elmar Nth , Tobias Bocklet, Arnd Gebhard A System for Speech and 3D Facial Image Acquisition, Modeling and Analysis Wednesday, 30 May 2012 Outline Motivation: Long-term goal of the project Patient groups: Parkinsons disease

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Digital Image Fundamentals and Image Acquisition 1/18/2011 1 Image Acquisition 1/18/2011 2 1

IMAGING OF FACIAL SKELETAL TRAUMA Anesa engi General Hospital Sarajevo FACIAL FRACTURES

Facial Expression Recognition YING SHEN SSE, TONGJI UNIVERSITY Facial expression recognition

Image Restoration Image Enhancement and Image Restoration both deal with improving images. Image

Based on: 1 Facial expression recognition based on Local Binary Patterns: A comprehensive study

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Cold Water Treading, Facial, and Herbal Wrap 26b Hydrotherapy: Cold Water Treading, Facial, and

Facial Reconstruction The Sublime Beauty of Normal P. Daniel Knott, MD FACS Facial Plastic and

SI231 Matrix Computations Lecture 3: Least Squares Ziping Zhao Fall Term 20202021 School of

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 12: Acoustic

Certifying the Safe Design of a Virtual Fixture Control Algorithm for a Surgical Robot Yanni

Outline 1. Introduction 2. Bio Molecules 2.1 Operation Principle and Applications of Microarrays

Coding by Voice with Open Source Speech Recognition David Williams-King Ph.D. student at

INTERACTION DESIGN in the era of AI* M O M O E S T R E L L A S E N I O R D E S I G N L E A D

- C ONCEPTS AND I MPLEMENTATION ICTP P School on Medical Physics for Radiati tion Thera rapy

Radiosurgical Planning Minimally invasive procedure that uses an intense, focused beam of

Sambuz

Useful Links

Newsletter

Mail Us