Eötvös Loránd University Faculty of Informatics Laszlo A. Jeni Daniel Takacs Andras Lorincz The University Realeyes Data Eotvos Lorand University of Tokyo Services Ltd H igh Q uality F acial E xpression R ecognition in V ideo S treams using S hape R elated I nformation only
Eötvös Loránd University Faculty of Informatics We are grateful to Jason Saragih for providing his CLM code for our studies
Eötvös Loránd University Outline Introduction Theory Datasets Experiments Faculty of Informatics Discussion
Eötvös Loránd University Introduction Goal: recognize discrete facial emotions in video streams. We use precise Constrained Local Model based face tracking and Faculty of Informatics shape related information for the emotion classification. High quality classification can be achieved.
Eötvös Loránd University Outline Introduction Theory Datasets Experiments Faculty of Informatics Discussion
Eötvös Loránd University Overview of the System Video Stream We register a 66 point 3D Face Detection constrained local model (CLM) for the face. 3D Facial Feature Point Registration Faculty of Informatics In 3D the CLM estimates the rigid parameters , therefore we can Normalization remove this deformation. We use either AU0 normalization , or Normalized 3D Shape personal mean shape normalization to remove the personal variation of the face. Support Vector Machine Finally, a multiclass SVM classifies the emotions. Emotion Label
Eötvös Loránd University Constrained Local Model Video Stream Point Distribution Model (PDM) Face Detection Where are the landmaks? 3D model 3D Facial Feature Point Registration Faculty of Informatics Normalization Parameters Scale, Projection to 2D, Rotation (yaw/pitch/roll), Mean shape, Non-rigid components (PCA), Normalized 3D Shape PCA Coefficients, Translation Support Vector Machine Emotion Label
Eötvös Loránd University Constrained Local Model Video Stream Local: “local experts” to locate the landmarks Face Detection (logit regressors) 3D Facial Feature Point Registration Faculty of Informatics Constrained: the relative position of the landmarks is constrained by the PDM) Normalization Optimization problem: Normalized 3D Shape Support Vector Machine l i = {-1,1} ith marker is (not) in a correct position Emotion Label
Eötvös Loránd University Constrained Local Model Video Stream Positive examples from an annotated dataset Face Detection 3D Facial Feature Point Registration Faculty of Informatics Negative examples: from the neighborhood Normalization Normalized prob estimations 3D Shape for one patch markers found Support Vector Machine Response map of the corner of the eye Emotion Label
Eötvös Loránd University Normalization Video Stream AU0 normalization Face Detection the difference between the features of the actual shape and the features of the first (neutral) frame 3D Facial Feature Point Registration Faculty of Informatics Personal Mean Shape Normalization AU0 normalization is crucial for facial expression Normalization recognition, however it is person dependent and it is not available for a single frame. Normalized We assume that we have videos (frame series) 3D Shape about the subject like in the case of the BU 4DFE and we can compute the personal mean shape. We found that the mean shape is almost identical Support Vector Machine to the neutral shape, i.e., to AU0. Emotion Label
Eötvös Loránd University SVM Based Classification Video Stream SVM seeks to minimize the cost Face Detection function 3D Facial Feature Point Registration Faculty of Informatics Normalization Normalized Multi-class classification: 3D Shape decision surfaces are computed for all class pairs, Support Vector Machine for k classes one has k(k − 1)/2 decision surfaces voting for decision. Emotion Label
Eötvös Loránd University Outline Introduction Theory Datasets Experiments Faculty of Informatics Discussion
Eötvös Loránd University Datasets Cohn-Kanade Extended 2D images of 118 subjects annotated with the seven Faculty of Informatics universal emotions Ground truth landmarks AU validated emotion labels BU-4DFE High-resolution 3D video sequences of 101 subjects Six prototypic facial expressions No ground truth landmarks (they were provided by the CLM) Posed expressions
Eötvös Loránd University Outline Introduction Theory Datasets Experiments Faculty of Informatics Discussion
Eötvös Loránd University CK+ with original landmarks We used the CK+ dataset with the original 68 2D landmarks Calculated the mean shape using Procrustes’s method Faculty of Informatics Normalized all shapes by minimizing the Procrustes distance between individual shapes and the mean shape Compared AU0 normalization with Personal Mean Shape normalization Trained a multi-class SVM using the leave-one-subject-out cross validation method
Eötvös Loránd University CK+ with original landmarks Emotions with large distortions, such as disgust, happiness and surprise, gave rise to nearly 100% classification performance. Faculty of Informatics Even for the worst case, performance was 92% (fear). AU0 normalization
Eötvös Loránd University CK+ with original landmarks Emotions with large distortions, such as disgust, happiness and surprise, gave rise to nearly 100% classification performance. Faculty of Informatics Even for the worst case, performance was 92% (fear). AU0 normalization Replacing AU0 normalization by personal mean shape slightly decreases average performance: recognition on the CK+ database drops from 96% to 94.8% Personal Mean Shape normalization
Eötvös Loránd University CLM tracked CK+ We studied the performance of the multi-class SVM using CLM method on the CK+ dataset. We tracked facial expressions with the CLM tracker and annotated all Faculty of Informatics image sequences starting from the neutral expression to the peak of the emotion. 3D CLM estimates the rigid and non-rigid transformations: We removed the rigid ones from the faces and projected the frontal view to 2D.
Eötvös Loránd University CLM tracked CK+ Classification performance is affected by imprecision of the CLM tracking. Faculty of Informatics Emotions with large distortions can be still recognized in about 90% of the cases, whereas AU0 normalization more subtle emotions are sometimes confused with others. With the Personal Mean Shape Normalization correct classification percentage rises from 77.57% to 86.82% for the CLM tracked CK+. Personal Mean Shape normalization
Comparison of results on CK+ Eötvös Loránd University Faculty of Informatics T/S – Texture/ Shape information
Eötvös Loránd University CLM tracked BU-4DFE (frontal case) We characterized the BU-4DFE database by using the CLM technique: We selected a frame with neutral expression and an apex frame of the same frame series. I used these frames and all frames between them for the evaluations. Faculty of Informatics We applied CLM tracking for the intermediate frames in order, since it is more robust than applying CLM independently for each frames. We removed the rigid transformation after the fit and projected the frontal 3D shapes to 2D. We applied a 6 class multi-class SVM this database does not contain contempt and evaluated the classifiers by the leave-one-subject-out method. We compared the normalization using the CLM estimation of the AU0 values with the normalization based on the personal mean shape.
Eötvös Loránd University CLM tracked BU-4DFE (frontal case) Faculty of Informatics AU0 normalization Personal Mean Shape normalization We found an 8% improvement on the average in favor of the mean shape method.
Eötvös Loránd University CLM tracked BU-4DFE (frontal case) We executed cross evaluations. We used the CK+ as the ground truth, since it seems more Faculty of Informatics precise: the target expression for each sequence is fully FACS coded, emotion labels have been revised and validated, and CK+ utilizes FACS coding based emotion evaluation and this method is preferred in the literature considered . We note however, that both the Cross Evaulation: CK – BU-4DFE CK+ and the BU 4DFE facial expressions are posed and not spontaneous.
Eötvös Loránd University Pose invariance on BU-4DFE Question: CLM’s performance as a function of pose, pose invariant emotion recognition for situation analysis. We used the BU 4DFE dataset to render 3D faces with six emotions (anger, disgust, Faculty of Informatics fear, happiness, sadness, and surprise), which are available in the database. We randomly selected 25 subjects and rendered rotated versions of every emotion. We covered rotation angles between 0 and 44 degrees of anti-clockwise rotation around the yaw axis.
Recommend
More recommend