Multi-Modal Emotion Estimation Presenter: Dr. Mohammad Mavadati and Dr. Taniya Mishra
Our Emotions, influence how we live and experience life! @affectiva
But, we’re also surrounded by High IQ and no EQ devices • @affectiva
Affectiva mission: humanize technology with Human Perception AI Pioneers of Human Perception AI. Only multi-modal in cabin sensing AI. Using deep learning, computer vision, voice analytics and AI software that understands all thin ings gs human – massive amounts of data, Affectiva analyzes face and nuanced d human emotio ions, complex cognit itiv ive states, voice to understand state of humans in vehicle. behaviors, activities, interactions and objects people use. Face: 😄 7 emotions, indicators of attention, drowsiness, distraction, positive / negative, 20+ facial expressions 😢 and demographics 😟 Voice: Arousal, laughter, anger, gender @affectiva
Emotion AI detects emotion and cognitive states the way people do Affectiva’s multi-modal People communicate through multiple modalities Emotion AI 55% Multi-modal Voice Face Facial expressions and gestures 7 emotions, • Developing early and late indicators of Arousal, laughter, fusion of modalities for anger, gender attention, 38% deeper understanding of drowsiness, How the words complex states distraction, positive / are said • Expanding beyond face negative, 20+ facial and voice expressions and demographics 7% The actual words Source: Journal of Consulting Psychology. @affectiva
Emotion AI is a multi-modal and multi-dimensional problem Multi-modal - Human emotions manifest in a variety of ways including your tone of voice and your face Many expressions - Facial muscles generate hundreds of facial actions, speech has many different dimensions - from pitch and resonance, to melody and voice quality Highly nuanced – Emotional and cognitive states can be very nuanced and subtle, like an eye twitch or your pause patterns when speaking Non-deterministic - Changes in facial or vocal expressions, can have different meanings depending on the person’s context at that time Temporal lapse - As an individual’s state unfold over time, algorithms need to measure moment by moment changes to accurately capture of mind Context – Understanding complex state of mind requires contextual knowledge of the surrounding environment and how an individual is interacting with it Massive data - Emotion AI algorithms need to be trained with massive amounts of real world data that is collected and annotated • @affectiva
Display and perception of emotion is not perfectly aligned CREMA-D*: large scale study of emotion and perception Human recognition of intended emotion based on ● 91 participants ● voice-only: 40.9% ● 6 emotions of varying intensities ● face-only: 58.2% ● 7442 emotion samples. ● face and voice: 63.6% ● 2443 observers
Difference in emotion perception from Face vs. Speech modalities Confusion matrices showing emotions displayed by humans, recognized by other human observers
Difference in emotion perception from Face vs. Speech modalities
Difference in emotion perception from Face vs. Speech modalities
Emotion AI at Affectiva How it works @affectiva
Data driven approach to Emotion AI Data Algorithms Evaluation Multi-Modal Data Acquisition Data Training & Validation Output Parallelize deep learning Large amounts of real world Annotation experiments on a massive scale video & audio data; different Infrastructure Multi-modal classifiers for ethnicities and contexts machine perception, e.g., Manual and expressions, emotions, automated cognitive states and labeling of video demographics and speech Product Delivery: APIs and SDKs The classifiers and run-time system are optimized for the cloud or on device or embedded @affectiva
Data matters … @affectiva
Massive proprietary data and annotations power our AI 4 Bn 7.5 MM FRAMES FACES 836 MM 87 Legend AUTO FRAMES COUNTRIES Top 10 countries Others ✓ Foundation: Large, diverse & real world data built in the past 7 years ✓ Growing automotive in-cabin data with scalable data acquisition strategy CONFIDENTIAL @affectiva 14
Deep learning Anger 0.09133 Contempt 0.62842 Disgust 0.20128 Fear 0.00001 Happiness 0.00041 • Allows for end-to-end learning Affectiva’s focus is on deep learning of one or more complex tasks jointly • It allows modeling of more complex problems with • Solves a variety of problems: classification, segmentation, higher accuracy than other temporal modeling machine learning techniques @affectiva
Vision pipeline The current vision SDK consists of steps • Face detection: given an image, detect faces • Landmark localization: given a image + bounding box, detect and track landmarks • Facial analysis: detect facial expression/emotion/attributes per face analysis Face detection Landmark localization Facial analysis (RPN + bounding boxes) (Regression + confidence) (Multi-task CNN) bounding boxes Landmark image Classification face image Attributes refinement Shared Shared Shared Conv. Conv. Conv. Landmark Region Proposal estimate Emotions Network Confidence @affectiva
Speech pipeline The current speech pipeline consists of these steps: • Speech detection: given audio, detect speech • Speech enhancement: given noisy speech speech segment, mask noise • Speech analysis: detect speech events/emotion/attributes per audio segment analysis Speech detection Speech enhancement Speech analysis Speech enhanced speech Speech events Single-channel audio Inverse STFT detected VAD (voice activity STFT NSM model: detection): Speech vs. Speech vs. non-stationary Noise stationary noise noise suppression Speech Emotions @affectiva
Multi-Modal Applications Media and Human Automotive Advertising Robotics entertainment resources Gaming Healthcare and Devices Video Online quantified self communication education
Multimodal for Automotive
Affectiva Automotive AI @affectiva
Human Perception AI fuels deep understanding of people in a vehicle Delivering valuable services to vehicle occupants depends on a deep understanding of their current state = + Affectiva Automotive AI Advanced Third Party Vehicle Services Solutions Facial expressions In-Cab Context Safety Infotainment content Next generation driver monitoring Tone of voice Inanimate objects Smart handoff & safety drivers Body posture Cabin environment Proactive intervention Object detection External Context Occupant Experience Weather Individually customized baseline Anger Enjoyment Traffic Adaptive environment Signs Surprise Attention Personalization across vehicles Pedestrians Distraction Excitement Drowsiness Stress Intoxication Discomfort Personal Context Monetization Cognitive Load Displeasure Identity Differentiation among brands Likes/dislikes & preferences Premium content delivery Occupant state history Purchase recommendations Calendar @affectiva 21
Affectiva Automotive AI Modular and extensible deep learning platform for in-cabin human perception AI • Core technology is shared and reused across different modules • Modular packaging enables light-weight deployment of capabilities for a specific use case • Extend existing capabilities by adding more modules Driver Monitoring Occupant State Occupant Activities Cabin State • Drowsiness levels • Facial and vocal emotion • Talking • Occupant location and presence • Distraction levels • Mood (valence) • Texting • Objects left behind • Cognitive load • Multimodal emotion: frustration • Cellphone in hand • Child left behind • Engagement Core Technology Face & head tracking Facial expression recognition Object detection Voice detection Flexible Platform • 3D Head pose • Object classes: • Voice activity detection • 20 Facial expressions: • Support Near IR sensors • Support ARM ECU e.g. smile, eye brow raise mobile device, bags • Drowsiness markers: • Object location • Support multiple camera positions eye closure, yawn, blink @affectiva Affectiva Confidential
Automotive data collection for multimodal analysis @affectiva
Automotive Data Acquisition To develop a deep understanding of the state of occupants in a car, one needs large amounts of data. With this data we can develop algorithms that can sense emotions and gather people analytics in real world conditions. In-Car Data Acquisition (Quarterly) Spontaneous 42,000 miles and 2,000+ hours driven occupant data 200+ drivers on 3 continents Data partnerships Using Affectiva Driver Kits and Affectiva Moving Labs Acquire 3rd party natural in-cab data through to collect naturalistic driver and occupant academic and commercial partners (MIT AVT, data to develop metrics that are robust to fleet operators, ride-share companies) real-world conditions Simulated data Auto Data Collect challenging data in safe lab simulation Corpus environment to augment the spontaneous driver dataset and bootstrap algorithms (e.g. drowsiness, intoxication) multi-spectral & transfer learning. @affectiva
Automotive AI data Automotive AI 1.0 tracks metrics for driver monitoring as well as emotion estimation Driver Drowsiness Emotion detection Detecting eye closure and yawning events Detect driver emotions including surprise and joy @affectiva
Multimodal frustration: A case study @affectiva
Recommend
More recommend