the future of the in car experience
play

The Future of the In-Car Experience Abdelrahman Mahmoud Product - PowerPoint PPT Presentation

The Future of the In-Car Experience Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist @affectiva Affectiva Emotion AI Emotion recognition from face and voice powers several industries Social Robots Mood


  1. The Future of the In-Car Experience Abdelrahman Mahmoud Product Manager Ashutosh Sanan Computer Vision Scientist @affectiva

  2. 
 Affectiva Emotion AI Emotion recognition from face and voice powers several industries Social Robots Mood Tracking Interviewing Drug Efficacy Content Management (video / audio) Customer Analytics Academic Research Banking Telehealth Focus Groups Connected devices / loT Health & Wellness Surveillance Education Market Research Social Robotics Recruiting MOOCs Legal Mental health Security Telemedicine Healthcare Web Conferencing Real time student feedback Automotive Live streaming Video & Photo organization Retail Fraud Detection Virtual Assistants Gaming Online education In market products since 2011 Recognized Market / AI Leader Built using real-world data • 1/3 of Fortune Global 100, 1400 brands • Spun out of MIT Media Lab • 6.5M face videos from 87 countries • OEMs and Tier I suppliers • Selected for Startup Autobahn and Partnership on AI 
 • 42,000 miles of driving quarterly @affectiva

  3. Affectiva Automotive AI The Problem Affectiva Solution Transitions in control in semi-autonomous vehicles (e.g. the L3 handoff problem) Next generation AI based system Driver to monitor and manage driver capability Safety Current solutions based on for safe engagement steering wheel sensors are irrelevant in autonomous driving First in-market solution Occupant Differentiated and monetizable for understanding occupant state Experience in-cab experience (e.g. the L4 luxury and mood to enhance overall car challenge) in-cab experience @affectiva

  4. People Analytics People Analytics context-aware with Emotion AI as the foundational technology. People Analytics Safety In-Cab Context Next generation driver monitoring Emotion AI Occupant relationships 
 Smart handoff Infotainment content Inanimate objects Facial expressions Proactive intervention Cabin environment Tone of voice Body posture External Context Personalization Anger Enjoyment Weather Individually customized baseline Surprise Attention Traffic Adaptive environment Distraction Excitement Signs Personalization across vehicles Drowsiness Pedestrians Stress Intoxication Discomfort Cognitive Load Displeasure Personal Context Monetization Identity Differentiation among brands Likes/dislikes & preferences Premium content delivery Occupant state history Purchase recommendations Calendar @affectiva

  5. Affectiva approach to addressing Emotion AI complexities Data Algorithms Infrastructure Team Our robust and scalable Using a variety of deep Deep learning infrastructure Our team of researchers and data strategy enables us to learning, computer vision allows for rapid technologists have deep acquire large and diverse and speech processing experimentation and tuning expertise in machine data sets, annotate these approaches, we have of models as wells as large learning, deep learning, data using manual and developed algorithms to scale data processing and science, data annotation, automated approaches. model complex and model evaluation. computer vision and speech nuanced emotion and processing cognitive states. @affectiva

  6. World’s largest emotion data repository 87 countries, 6.5M faces analyzed, 3.8B facial frames Includes people emoting on device, and while driving Top Countries for Emotion Data UNITED KINGDOM 265K CHINA 562K JAPAN GERMANY 61K 148K USA 1,166K VIETNAM MEXICO 148K 150K PHILIPPINES INDIA 159K BRAZIL 1,363K 194K THAILAND 184K INDONESIA 325K @affectiva

  7. Data Strategy To develop a deep understanding of the state of occupants in a car, one needs large amounts of data. With this data we can develop algorithms that can sense emotions and gather people analytics in real world conditions. Spontaneous Foundational proprietary data will drive value occupant data to accelerate data partner ecosystem Data partnerships Using Affectiva Driver Kits and Affectiva Moving Labs Acquire 3rd party natural in-cab data through to collect naturalistic driver and occupant academic and commercial partners (MIT AVT, data to develop metrics that are robust to fleet operators, ride-share companies) real-world conditions Simulated data Auto Data Collect challenging data Corpus in safe lab simulation environment to augment the spontaneous driver dataset and bootstrap algorithms (e.g. drowsiness, intoxication) multi-spectral & transfer learning. @affectiva Affectiva Confidential

  8. @affectiva

  9. Affectiva approach to addressing Emotion AI complexities Data Algorithms Infrastructure Team Our robust and scalable Using a variety of deep Deep learning infrastructure Our team of researchers and data strategy enables us to learning, computer vision allows for rapid technologists have deep acquire large and diverse and speech processing experimentation and tuning expertise in machine data sets, annotate these approaches, we have of models as wells as large learning, deep learning, data using manual and developed algorithms to scale data processing and science, data annotation, automated approaches. model complex and model evaluation. computer vision and speech nuanced emotion and processing cognitive states. @affectiva

  10. Algorithms @affectiva

  11. Deep learning advancements driving the automotive roadmap The current SDK consists of deep learning networks that: • Face detection: given an image, detect faces • Landmark localization: given a image + bounding box, detect and track landmarks • Facial analysis: detect facial expression/emotion/attributes per face analysis Face detection 
 Landmark localization Facial analysis 
 (RPN + bounding boxes) (Regression + confidence) (Multi-task CNN/RNN) Temporal Expressions bounding boxes Landmark 
 image Classification face image refinement Attributes Shared Shared Shared Conv. Conv. Conv. Landmark 
 Region Proposal Emotions estimate Network Confidence @affectiva

  12. Task: Facial Action/Emotion Recognition • Given a face classify the Joy corresponding visual expression/ emotion occurrence. • Many Expressions: Facial muscles generate hundreds of facial expressions/emotions. Yawn • Multi-Attribute Classification • Fast enough to run on mobile/ embedded devices. Eye Brow Raise @affectiva

  13. Is a single image always enough? Giphy @affectiva

  14. Information in Time 105 Intensity of Expression 79 53 26 0 1 2 3 4 5 6 7 8 9 TIME Emotional state continuously evolving process over time. Adding temporal information makes it easier to detect highly subtle changes in facial state. How to utilize temporal information • Use post-processing based over static classifier output using previous predictions and images. • Use Recurrent Architectures. @affectiva

  15. Spatio-Temporal Action Recognition 0 CNN 0 L S CNN 0.5 Frame Level , T Classification , M , CNN , Temporal Sequence of 0.8 Frames Learning temporal Spatial Feature Extraction structure Yawn Recognition using CNN + LSTM @affectiva

  16. Training Challenges & Inferences @affectiva

  17. Data challenges Missing Frames in Sequence While training RNN’s expect a continuous temporal sequence. Missing facial frames 
 • Bad lighting • Face out of view • Face not visible Missing human annotations 
 Facial frames not labeled by humans Possible Solutions 
 • Use shorter and fixed continuous sequences with no missing data • Copy the last state of the sequence. Repeat last tracked frame • Mask the missing frames @affectiva

  18. Masking vs Copying last state Results indicate that masking works better than copying the last state Chart Title 0.97 0.963 0.955 0.948 0.94 ROC-AUC Val Acc Using last state Masking @affectiva

  19. How to train a Spatio-Temporal model? Two approaches to train our model: Input A Expressions • Train both convolution and Transfer recurrent filters jointly. • Transfer learning using previously learned Yawn Input B convolutional filters. Frozen Feature Extractors @affectiva

  20. Transfer learning for runtime performance Usage of transfer learning to help with the runtime performance Temporal Expressions • Increased runtime performance Attributes Shared to run on mobile. Conv. Emotions • Minimal benefit by tuning filters Intelligent Filter Reuse from scratch. Chart Title 0.97 • Large real-world dataset for 0.968 pretrained filters. 0.966 0.963 0.961 ROC-AUC Val Acc Fixed Weights Fully Trainable @affectiva

  21. Does temporal info always help? Yawn ROC-AUC Performance Outer Brow Raiser-AU02 ROC-AUC Performance (Temporal vs Static) (Temporal vs Static) 0.97 0.89 0.96 0.883 0.95 0.875 0.94 0.868 0.93 0.86 Static Temporal Static Temporal Smile ROC-AUC Performance (Temporal vs Static) 0.962 0.954 0.946 0.938 0.93 Static Temporal @affectiva

  22. Models in Action @affectiva

  23. Key Takeaways • Not all the metrics are benefited by adding complex temporal information • Using all the data (complete & partial sequences) definitely helps the model • Masking works better with partial sequences than copying last frames • Intelligent filters reuse makes it possible to deploy these models on mobile with real-time performance @affectiva

Recommend


More recommend