Towards the subjective-ness in facial expression analysis Jiabei - PowerPoint PPT Presentation

Towards the subjective-ness in facial expression analysis Jiabei Zeng, Ph.D. August 21, 2019 @ VALSE Webinar

It is subjective for human beings to recognize the facial expression. Different individuals have different understandings of the facial expression. 2

The six ix ba basic ic emo motions tions  Universal across culture He was about to fight angry 3

The six ix ba basic ic emo motions tions  Universal across culture His child had just died. sad 4

Unive iversal sal ≠ 100% 00% Consis nsistent ent Elfenbein H A, Ambady N. On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychological 5 bulletin, 2002, 128(2): 203.

Humans’ annotations are subjective. How do we make the machines objective? Subjective tive-ness ess of human Subj bjectiv ctive-ne ness ss of th the machines hines Trainin ning g datase set t has s annotati ation on bias Trained ined sy syste tem m has s re recog ognition nition bi bias s 6

Humans’ annotations are subjective. How do we make the machines objective?  “ 兼听则明，偏信则暗 ” : To Learn the classifier from multiple datasets instead of the only one  To describe facial expression in a more objective way: Facial Action Coding System (FACS) 7

Challeng allenge  How to evaluate the machine?  Consistent performance boost on diverse test datasets.  How to train the machine?  More data by merging multiple training datasets ≠ Better performance of the trained system A+R < R A+R < A 9

Le Learn arn from om da datase asets ts wi with annot notation ation bi biases ases  Inconsistent Pseudo Annotations to Latent Truth framework  multiple inconsistent annotations  unlabeled data Data A Data B Step 1: labelA : happy labelB : sad Train machine coders … … labelB : fear labelA : disgust Model A Model B … … 10

Le Learn arn from om da datase asets ts wi with annot notation ation bi biases ases  Inconsistent Pseudo Annotations to Latent Truth framework  multiple inconsistent annotations labelA : happy  unlabeled data predB : happy Data A Model B labelA : disgust Step 1: Step 2: predB : angry Train machine coders Predict pseudo labels labelB : sad predA : sad Data B Model A labelB : fear predA : angry predA : disgust Model B predB : disgust Data U Model A predA : sad predB : disgust 11

Le Learn arn from om da datase asets ts wi with annot notation ation bi biases ases  Inconsistent Pseudo Annotations to Latent Truth framework  multiple inconsistent annotations Step 3:  unlabeled data Train L atent T ruth Net Step 1: Step 2: labelA : disgust LT : Data A Train machine coders Predict pseudo labels predB : angry disgust estimate LT : Latent labelB : fear Data B predA : angry angry Truth(LT) LT : predA : disgust angry predB : disgust Data U … LT : predA : sad sad predB : disgust 12

Le Learn arn from om da datase asets ts wi with annot notation ation bi biases ases  Inconsistent Pseudo Annotations to Latent Truth framework  multiple inconsistent annotations  unlabeled data Step 3: Step 1: Step 2: Train L atent T ruth Net Train machine coders Predict pseudo labels 13

Conven nventional tional archit hitecture ecture v.s. . Latent ent Truth uth Net  Conve nventional ntional archit itec ecture ture  p is the predicted probability of each facial expression  y is the ground truth label 14

Conven nventional tional archit hitecture ecture v.s. . Latent ent Truth uth Net  LTN TNet learns from samples with inconsistent annotations predicted annotation for each coder latent truth probability transition matrix for each coder 15

Expe xperi riments ments on on syn ynthe thetic tic data  Synthetic data  LTNet can reveal the true labels  Make 3 copies of the training set of LTNet-learned latent truth CIFAR-10.  Randomly add 20%,30%,40% label Ground truth noises, respectively.  Evaluate the methods on the clean test set of CIFAR-10. 16

Expe xperi riments ments on on syn ynthe thetic tic data  Evaluations on synthetic data  LTNet is compatible with the CNN trained on clean data Test accuracy of different methods 17

Expe xperi riments ments on on FE FER da datase asets ts  Training data  Dataset A: AffectNet (training part)  Dataset B: RAF(training part)  Unlabeled data:  un-annotated part of AffectNet (~700,000 images)  unlabeled facial images downloaded from Bing (~500,000 images)  Test data  In-the-lab  CK+, MMI, CFEE, Oulu-CASIA  In-the-wild  SFEW ， AffectNet (validation part), RAF (test part) 18

Expe xperi riments ments on on FE FER da datase asets ts  Evaluation on FER datasets Table 1. Test accuracy of different methods( Bold : best, Underline : second best) 19

Expe xperi riments ments on on FE FER da datase asets ts  LTNet-learned transition matrix T for 4 coders  Human coder (RAF) is the most reliable  Labels in RAF are derived from ~40 annotations per image machine coder machine coder human coder (AffectNet) human coder (RAF) (AffectNet trained model) (RAF trained model) 20

Expe xperi riments ments on on FE FER da datase asets ts  Statistics of the samples  For majority of the samples, the latent truth agrees with either/both the human annotation or/and model prediction.  For few samples, the latent truth differs from both the human annotation and model prediction (case2, case3) 21

Expe xperi riments ments on on FE FER da datase asets ts  Examples in the 5 cases  LTNet-learned latent truth is reasonable H : human annotation G : LTNet- learned latent truth A : predictions by AffectNet- trained model R : predictions by RAF-trained model 22

From subjective bjective-ness ness to to obj bjective ective-ness ness emotional category Facial Action Coding System 24

Facia cial Ac Action ion Coding ing System em (FACS) CS)  Taxonomizes facial muscle movements by their appearance  Human-defined facial action units (AUs) AU1: Inner brow raiser AU2: Outer brow raiser AU4: Brow lowerer AU5: Upper lid raiser AU7: Lid tightener * Pictures are from “ Facial Action Coding System, Manual” by P. Ekman, V. Friesen, J. C. Hager 25

Wh What di did we we usually ually do do? manually BP4D annotated data DISFA AlexNet supervised learning state-of-the-art (e.g., VGGNet JAA-Net, ECCV’18) 26

Can Can we we le learn arn from om the the unlabeled labeled vi vide deos? os? Le Lear arn n from om the he cha hang ngings! ings! Facial actions appear as the local changings of the faces between frames! Changings are easy to be detected without manual annotations! 27

Can Can we we le learn arn from om the the unlabeled labeled vi vide deos? os? changing of facial actions changing of changing of face head poses 28

Can Can we we le learn arn from om the the unlabeled labeled vi vide deos? os?  Supervisory task  Change the facial actions or head poses of the source frame to those of the target frame by predicting the related movements respectively changing of facial actions changing of changing of face head poses 29

Self lf-super supervised vised le learning arning from vi vide deos os source image target image 30

Self lf-super supervised vised le learning arning from vi vide deos os re-generate facial action changes AU feature ≈ source image target image 31

Self lf-super supervised vised le learning arning from vi vide deos os re-generate facial action changes AU feature ≈ source image target image ≈ pose changes re-generate 32

Self lf-super supervised vised le learning arning from vi vide deos os re-generate facial action changes AU feature ≈ ≈ source image target image ≈ pose changes re-generate 33

Twi winCycle cle AutoEncoder ncoder  Fea eature ture di disent entangle anglement ment target source AU-related displacements  sparse: local  small values: subtle 34

Twi winCycle cle AutoEncoder ncoder source  Cyc ycle le wit ith AU AU changed ged target source  Pixel consistency 35

Twi winCycle cle AutoEncoder ncoder  Cyc ycle le with po pose changed ged target source source  Pixel consistency 36

Twi winCycle cle AutoEncoder ncoder  Targe get rec econs nstruction truction target target source source  Pixel consistency 37

Towards the subjective-ness in facial expression analysis Jiabei - PowerPoint PPT Presentation

Towards the subjective-ness in facial expression analysis Jiabei Zeng, Ph.D. August 21, 2019 @ VALSE Webinar It is subjective for human beings to recognize the facial expression. Different individuals have different understandings of the

CEO Visit Inverness Loch Ness The development of the Loch Ness 360 Trail Loch Ness Visitor

Van Ness BRT CAC August 25 Agenda 1. Call to Order. Van Ness BRT CAC members please sit at the

Park Ridge Elementary: NESS Program Coordinators Ness Coordinator:

Three measures of subjective social status Subjective measures of social Centers (1949):

Van Ness Corridor Transit Improvement Project Update July 7, 2015 Project Overview Van Ness

Generic Generic and Subjective and Subjective Assisting Assisting Conversational Conversational

Models for Inexact Reasoning Reasoning with Subjective Pseudo Reasoning with Subjective Pseudo

Subjective Expected Utility Tommaso Denti March 8, 2015 We will go over Savages subjective

VAN NESS ELEMENTARY SCHOOL DISTRICT OF COLUMBIA DEPARTMENT OF GENERAL SERVICES SIT MEETING

One South Van Ness Project Purpose and Need Improve transit reliability, speed, connectivity

DATACOMM OMM CL CLOU OUD D BUS USINESS NESS Cloud Computing Infrastruktur Ekonomi Digital

The hub plan, 30 Van Ness avenue project, 98 franklin street project, and hub housing

A general-purpose program structure for variational Monte-Carlo calculation Kyrre Ness Sjbk

42 42-ness is something whos successor is 43-ness forty-two 42-ness

Neural Reranking Improves Subjective Quality of Machine Translation: NAIST at WAT 2015 Graham

Anxiety Subjective Anxiety Dif ifference Between Stress and Anxiety Stress is an external

Compositional Semantics Jacob Andreas Problem 1 Each of the three girls has a platypus. Each of

Heres a shocking fact: for most writers, the second draft is worse than the first. 1. Writers

Social Emotional Development in the Early Years: Enriching social emotional literacy

Ground: A Data Context Service Joe Hellerstein, Vikram Sreekanti, Joey Gonzalez, et al . CIDR 2017

Doub DoubleChec leCheck Y k Your T our Theor heorems ems Car Carl Eastlund l Eastlund

Reasoning and Meta-reasoning Sonia Marin IT-University of Copenhagen, Denmark 85-211

Direct Methods in Visual Odometry July 24, 2017 Direct Methods in Visual Odometry July 24, 2017

Scaling Datacenter Accelerators With Compute-Reuse Architectures Adi Fuchs and David Wentzlaff