Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models - PowerPoint PPT Presentation

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Stefan Mathe, Cristian Sminchisescu Presented by Mit Shah

Motivation… Current Computer Vision ● Annotations subjectively ○ defined Intermediate levels of ○ computation?? 2

Motivation… Lack of large scale datasets that provide recordings of the workings of the ● human visual system 3

Previous Work... Study of Gaze patterns in Humans ● A person browsing reddit with the F-shaped pattern 4

Previous Work... Study of Gaze patterns in Humans ● Inter-observer consistency ○ 5

Previous Work... Study of Gaze patterns in Humans ● Inter-observer consistency ○ Bottom-up Features ○ 6

Previous Work... Study of Gaze patterns in Humans ● Inter-observer consistency ○ Bottom-up Features ○ Human Fixations ○ 7

Previous Work... Study of Gaze patterns in Humans ● Inter-observer consistency ○ Bottom-up Features ○ Human Fixations ○ Models of saliency ○ 8

Previous Work... Study of Gaze patterns in Humans ● Inter-observer consistency ○ Bottom-up Features Action ○ Recognition Human Fixations ○ Models of saliency ○ Uses of Saliency maps ○ Object Localization Scene Classification 9

Previous Work... Study of Gaze patterns in Humans ● Inter-observer consistency ○ Bottom-up Features ○ Human Fixations ○ Models of saliency ○ Uses of Saliency maps ○ Previous data sets ○ At most few hundred videos recorded under free viewing conditions 10

Contributions... (1) Extended existing large scale datasets Hollywood-2 and UCF Sports ❏ 11

Contributions... (2) Dynamic consistency and alignment measures ❏ Temporal AOI AOI Markov Alignment Dynamics 12

Contributions... (3) Training an End-to-End automatic visual action recognition system ❏ 13

Data Collection... Largest and Most challenging dataset Hollywood-2 Movie Dataset 12 classes 69 movies 823/884 split 487k frames Answering phone, 20 hr driving a car, eating, fighting, etc. 14

Data Collection... UCF Sports Action Dataset Broadcast of television channels 150 videos covering 9 sports action classes Diving, golf swinging, kicking, etc.. 15

Many other Data Collection... Timings/Durations Specifications & Breaks SMI iView X HiSpeed Extending the two data sets 1250 Tower-Mounted Eye Tracker Context Recognition Action 19 Recognition Humans Free d e Viewing d i v i D 3 o t n s i k s a TASKS Recording Environment Recording Protocol T 16

Static & Dynamic Consistency Action Recognition by Humans Goal & Importance ● Human errors ● Co Occurring Actions ○ False Positives ○ Mislabeling Videos ○ 17

Static Consistency Among Subjects How well the regions fixated by human subjects agree on a frame by ● frame basis? Evaluation Protocol ● 18

Static Consistency Among Subjects 19

The Influence of Task on Eye Movements Hypothesis n A Derive Predict S A \ {s} prediction Saliency Fixations of scores Maps Subject s n A Times Independent p-value 2-sample >= 0.5 ? T-test with Evaluate n B Derive S A unequal average prediction Saliency variances prediction scores Maps score for s’ in S B 20

The Influence of Task on Eye Movements Results - 21

Dynamic Consistency Among Subjects Spatial distribution - highly consistent ● Significant consistency in the order also?? ● Automatic Discovery of AOIs & 2 metrics ● AOI Markov dynamics ○ Temporal AOI alignment ○ 22

Scanpath representation Human fixations - tightly clustered ● Assigning to closest AOI ● Trace the scan path ● 23

Automatically Finding AOIs Clustering the fixations of all subjects in a frame ● Successively Increase Link centroids Start until the sum of squared from successive K-Means errors drops below a with 1 frames into tracks threshold cluster Each fixation assigned to the Each resulting track closest AOI at becomes an AOI the time of creation 24

Automatically Finding AOIs . 25

AOI Markov Dynamics Transitions of human visual attention between AOIs by.. ● Probability of Transitioning to AOI “b” @ time t Human Fixated at Fixation AOI “a” @ String f i time t-1 26

Temporal AOI Alignment Longest Common Subsequence?? ● Able to handle gaps and missing elements ● 27

Evaluation Pipeline Interest Visual Point Descriptor Classifiers Dictionary Operator Spacetime RBF-2 kernel Input: A video Cluster generalization and Multiple Output: A set of descriptors into of the HoG & Kernel Learning spatio-temporal 4000 Visual MBH from (MKL) coordinates words using optical flow framework K-means 28

Human Fixation Studies Human vs. Computer Vision Operators Fixations as interest point detector ● Findings ● Low correlation ○ Why?? ○ 29

Impact of Human Saliency Maps for Computer Visual Action Recognition Saliency maps encoding only the weak surface structure of fixations (no time ordering), can be used to boost the accuracy of contemporary methods 30

Saliency Map Prediction Static Features Motion Features AUC & Spatial KL Divergence 31

Automatic Visual Action Recognition 32

Conclusions Combining Human + Computer Vision ● Extending Dataset ● Evaluating Static & Dynamic Consistency ● Human Fixations -> Saliency Maps ● End-to-End Action Recognition System ● 33

Thanks! 34

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models - PowerPoint PPT Presentation

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Stefan Mathe, Cristian Sminchisescu Presented by Mit Shah Motivation Current Computer Vision Annotations subjectively defined

gaze-following and recognizing intentions from gaze Outline infant gaze following studies

Gaze Tracking -Shashank Shekhar Aim To estimate a person's gaze using a webcam. Gaze

Multimodal Interaction Eye Gaze and Head Movement Tracking Iris Recognition Dr Pradipta Biswas,

Learning video saliency from human gaze using candidate selection Rudoy,Goldman, Schechtman,

a story telling robot: modelling and evaluation of human-like gaze behaviour 1 motivations

13 th November 2015 John Liddle Senior Account Manager Tobii Dynavox Tobii Dynavox Our

Implementation Strategies for Eye Gaze Users Katelyn Oeser SLP Brenda Del Monte SLP They are

Parts of the Eye and Eye Disorders Pretest By: Colby Tharp Parts of the Eye Parts of the Eye

Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye

HSEye Full Full- Full Full - - -Time HSE eye Time HSE eye Time HSE eye Time HSE eye 1

Eye Tracking and Topics EMA in Computer Eye tracking definition Science Eye tracker

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

LOW-LATENCY, NEAR-EYE GAZE ESTIMATION Michael Stengel, Alexander Majercik Part I (Michael) 25 min

Three classes of eye movements: Gaze Stabilization with body movement Optokinetic Nystagmus (OKN)

Saccade Tasks Visual Search Saccades Micro-Fixation Saccades Reading Gaze Shifts Reading Gaze

Learning to Predict Gaze in Egocentric Videos Yin Li, Alireza Fathi, James M. Rehg Outline: -

Jim Thomas jht@u.washington.edu Foege S340B Slides and problem sets will be posted on the same

C O I HFNOT FiO 2 FiO 2 1-6 l/min 24-40% 6-10 l/min 35-50% < 8 l/min 12 10

Current and Future Directions in User Interface Design & Why UX is Important! Anne Miller,

A National Web Conference on Assessing Safety Risks Associated With EHRs Presented by: David

iType: Using Eye Gaze to Enhance Typing Privacy Zhenjiang Li 1 , Mo Li 2 , Prasant Mohapatra 3 ,

CMCS427 Notes on moving the camera NOTE: Here eye = e = at, and look=d=lookAt, with e and d the

Visual Analytics Methodology for Eye M Movement Studies t St di Gennady Andrienko Natalia

Using Predictions in Online Optimization: Looking Forward with an

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models - PowerPoint PPT Presentation

Actions in the Eye: Dynamic Gaze Datasets and Learnt Saliency Models for Visual Recognition Stefan Mathe, Cristian Sminchisescu Presented by Mit Shah Motivation Current Computer Vision Annotations subjectively defined

gaze-following and recognizing intentions from gaze Outline infant gaze following studies

Gaze Tracking -Shashank Shekhar Aim To estimate a person's gaze using a webcam. Gaze

Multimodal Interaction Eye Gaze and Head Movement Tracking Iris Recognition Dr Pradipta Biswas,

Learning video saliency from human gaze using candidate selection Rudoy,Goldman, Schechtman,

a story telling robot: modelling and evaluation of human-like gaze behaviour 1 motivations

13 th November 2015 John Liddle Senior Account Manager Tobii Dynavox Tobii Dynavox Our

Implementation Strategies for Eye Gaze Users Katelyn Oeser SLP Brenda Del Monte SLP They are

Parts of the Eye and Eye Disorders Pretest By: Colby Tharp Parts of the Eye Parts of the Eye

Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye Optics of the Human Eye

HSEye Full Full- Full Full - - -Time HSE eye Time HSE eye Time HSE eye Time HSE eye 1

Eye Tracking and Topics EMA in Computer Eye tracking definition Science Eye tracker

Actions of Compact Quantum Groups V Free and homogeneous actions I Kenny De Commer (VUB,

LOW-LATENCY, NEAR-EYE GAZE ESTIMATION Michael Stengel, Alexander Majercik Part I (Michael) 25 min

Three classes of eye movements: Gaze Stabilization with body movement Optokinetic Nystagmus (OKN)

Saccade Tasks Visual Search Saccades Micro-Fixation Saccades Reading Gaze Shifts Reading Gaze

Learning to Predict Gaze in Egocentric Videos Yin Li, Alireza Fathi, James M. Rehg Outline: -

Jim Thomas jht@u.washington.edu Foege S340B Slides and problem sets will be posted on the same

C O I HFNOT FiO 2 FiO 2 1-6 l/min 24-40% 6-10 l/min 35-50% &lt; 8 l/min 12 10

Current and Future Directions in User Interface Design &amp; Why UX is Important! Anne Miller,

A National Web Conference on Assessing Safety Risks Associated With EHRs Presented by: David

iType: Using Eye Gaze to Enhance Typing Privacy Zhenjiang Li 1 , Mo Li 2 , Prasant Mohapatra 3 ,

CMCS427 Notes on moving the camera NOTE: Here eye = e = at, and look=d=lookAt, with e and d the

Visual Analytics Methodology for Eye M Movement Studies t St di Gennady Andrienko Natalia

Using Predictions in Online Optimization: Looking Forward with an

C O I HFNOT FiO 2 FiO 2 1-6 l/min 24-40% 6-10 l/min 35-50% < 8 l/min 12 10

Current and Future Directions in User Interface Design & Why UX is Important! Anne Miller,