DEEP UNCONSTRAINED GAZE ESTIMATION WITH SYNTHETIC DATA Shalini De Mello, Rajeev Ranjan, Jan Kautz
NVIDIA AI CO-PILOT 2
APPLICATIONS INTERFACE DESIGN AR/VR ACCESSIBILITY 3
TRADITIONAL GAZE TRACKERS Fovea Cornea C E Sclera Pupil C c Θ , Φ Light source Z-axis X-axis Point of regard Y-axis 4
TRADITIONAL GAZE TRACKER 5
REMOTE, CHEAP, UNCONSTRAINED GAZE TRACKING 6
CHALLENGES Unconstrained gaze tracking RESOLUTION 7
CHALLENGES Unconstrained gaze tracking RESOLUTION LIGHTING 8
CHALLENGES Unconstrained gaze tracking RESOLUTION LIGHTING SUBJECT VARIABILITY 9
CHALLENGES Unconstrained gaze tracking RESOLUTION LIGHTING SUBJECT VARIABILITY HEAD ROTATION 10
APPEARANCE-BASED GAZE ESTIMATION* Gaze tracking *Zhang et al., IEEE CVPR 2015. 11 *Krafka et al., IEEE CVPR 2016.
LABELED DATA COLLECTION Gaze tracking 12
LABELED DATA COLLECTION Gaze tracking Indoors only Occlusion Cumbersome 13
LABELED DATA COLLECTION Gaze tracking 14
LABELED DATA COLLECTION Gaze tracking Indoors only Limited head rotations and gazes 15
GPU TO THE RESCUE Gaze tracking Deep Learning Computer Graphics 16
GPU TO THE RESCUE Gaze tracking Deep Learning Computer Graphics 17
SYNTHETIC DATA 18
SYNTHETIC IMAGES Head models 19
COMPUTER GRAPHICS EYE MODEL* INSERT BLENDER IMAGE *Wood et al., IEEE ICCV 2015. 20
1 MILLION SYNTHETIC IMAGES 21
GAZE CNN ARCHITECTURE Zhang et al., 2015 (5 layers) Gaze pitch Gaze yaw Head pitch Head yaw 22
SYNTHETIC IMAGES Results on MPII data AUTHOR DATA ERROR (º) Wood et al., 2015 UT Multiview 1M 9.68 Wood et al., 2016 UnityEyes 1M 9.95 Wood et al., 2015 SynthesEyes 12K 8.94 Ours SynthesEyes 1M 7.74 23
EYE POINTS CNN ARCHITECTURE Trained with 1M synthetic data x 1 y 1 … x n y n 24
EYE FIDUCIAL POINTS Results on MPII gaze data 25
GAZE ESTIMATION NETWORK 26
GAZE CNN ARCHITECTURE Render for CNN (Su et al., 2015) 27
GAZE CNN ARCHITECTURE Zhang et al., 2015 (5 layers) Gaze pitch Gaze yaw Head pitch Head yaw 28
GAZE CNN ARCHITECTURE Ours (8 layers) Render for CNN Gaze pitch Gaze yaw Head pitch Head yaw 29
GAZE CNN ARCHITECTURE Results on 1M synthetic data NETWORK INITIALIZATION ERROR (º) LeNet Random 5.57 ImageNet AlexNet 5.03 (object recognition) ImageNet ResNet-50 5.07 (object recognition) Render for CNN Ours 4.4 (viewpoint estimation) 30
GAZE CNN ARCHITECTURE Inputs and outputs Render for CNN Gaze pitch Gaze yaw Head pitch Head yaw 31
GAZE CNN ARCHITECTURE Results on 1M synthetic data INPUT OUTPUT ERROR (º) Eye Eye-in-head 5.05 Eye Gaze 5.66 Eye, head pose Eye-in-head 4.4 Eye, head pose Gaze 4.4 32
HEAD ROTATION Eye appearance Zero head yaw Negative head yaw Positive head yaw 33
HEAD ROTATION Gaze distribution (1M synthetic data) 3 5 4 7 2 6 Gaze pitch 1 Gaze yaw 34
HEAD ROTATION Gaze distribution (45k MPII data) 1 0.5 1 1 Pose pitch 5 0 4 2 5 2 4 3 Gaze pitch -0.5 3 -1 -1 -0.5 0 0.5 1 Gaze yaw Pose yaw 35
HEAD ROTATION Head pose separation cluster 1 Render for CNN Gaze pitch Gaze yaw … Gaze pitch Gaze yaw Head pitch Head pitch Head yaw cluster n Head yaw 36
HEAD ROTATION Results on 1M synthetic data INPUT CNN ERROR (º) Eye single fc7-8 5.66 Eye branched fc7-8 5.18 Eye, head pose single fc7-8 4.4 Eye, head pose branched fc7-8 4.26 37
GAZE CNN ARCHITECTURE Results on 1M synthetic data Head pitch 5 Single Branched Head yaw 4.75 4.5 4.25 4 3.75 Error (º) 3.5 1 2 3 4 5 6 7 Head pose clusters 38
CNN ARCHITECTURE Skip connections cluster 1 Render for CNN Gaze pitch Gaze yaw … Gaze pitch Gaze yaw Pead pitch Pead yaw + cluster n 39
GAZE CNN ARCHITECTURE Results on 1M synthetic data INPUT CNN ERROR (º) Eye, head pose single fc7-8 4.4 Eye, head pose branched fc7-8 4.26 branched fc7-8, Eye, head pose 4.15 skip connections 40
REAL DATA Columbia cluster 1 Render for CNN Gaze pitch Gaze yaw … Gaze pitch Gaze yaw Pead pitch Pead yaw + cluster n 41
GAZE ERROR Columbia 8 All No Glasses 7.54 7.5 7 6.68 6.5 6.26 6 5.65 5.58 5.5 5 4.5 Error (º) 4 Wood et al., 2015 Our CNN Our CNN with Synthetic data 42
REAL DATA MPII gaze cluster 1 Render for CNN Gaze pitch Gaze yaw … Gaze pitch Gaze yaw Pead pitch Pead yaw + cluster n 43
GAZE ERROR MPII gaze 8 7.5 7 6.5 6.3 6 5.85 5.58 5.5 5 Error (º) 4.5 4 Zhang et al., 2015 Our CNN Our CNN with Synthetic data 44
CONCLUSION 45
CHALLENGES Unconstrained gaze tracking RESOLUTION LIGHTING SUBJECT VARIABILITY HEAD ROTATION 46
GPU TO THE RESCUE Unconstrained gaze tracking Deep Learning Computer Graphics 47
Recommend
More recommend