Much Ado About Time: Exhaustive Annotation of Temporal Data Gunnar A. Sigurdsson, Olga Russakovsky, Ali Farhadi, Ivan Laptev, Abhinav Gupta
Datasets drive computer vision progress Need: Computer vision capabilities (1) Dense, detailed, ImageNet multi-label [Deng ’09] annotations PASCAL VOC (2) Large-scale [Everingham ’07] Caltech 101 annotated [Fei-Fei ‘04] video datasets Algorithms: [Deng ’10], [Sanchez ’11], [Lin ’11], [Krizhevsky ’12], Algorithms: [Zeiler ’13], [Wang ’13], [Chum ’07], [Felzenszwalb ’08], [Sermanet ’13], [Simonyan ’14], Algorithms: [Wang ’09], [Harzallah ’09], [Lin ’14],[Girshick ’14], [Berg ’05], [Grauman ’05], [Bourdev ’09], [Vedaldi ’09], [Szegedy ’14], [He ’15], … [Zhang ’06], [Lazebnik ’06], [Lin ’09], [Lampert ’09], [Jain ’08], [Boiman ’08], [Carreira ’10], [Wang ’10], [Yang ’09], [Maji ’09] [Song ’11], [vanDeSande ’11], … [Wang ’10], [Zhou ’10], [Feng ’11], [Jiang ’11], … Dataset scale and complexity M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Multi-label video annotation puts 100-200 opens book turns on book on shelf walks stove eats sits down sneezes labels - - - + - - - - + + - - - + - - + - - + - - - - - + - - + - + - - + - 10,000 videos M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Multi-label video annotation puts opens book on turns on book shelf walks stove eats sits down sneezes ? - - + - - - ? + + - - - + ? - + - - + - ? - - - + - - ? - + - - + - M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Multi-label video annotation puts opens book on turns on book shelf walks stove eats sits down sneezes ? ? ? ? ? ? ? - + + - - - + - - + - - + - - - - - + - - + - + - - + - M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Which interface is better? One-label All-labels ☐ Opens book ☐ Opens book ☐ Puts book on shelf ☐ Walks ☐ Turns on stove vs ☐ Eats ☐ Sits down … Repeat N times for N labels Expect better annotation Expect better annotation accuracy time M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Which interface is better? All-labels One-label ☐ Opens book Data: 140 videos, each ~30 secs long ☐ Puts book on shelf ☐ Opens book ☐ Walks Labels: 52 human actions ☐ Turns on stove … Charades dataset of [Sigurdsson ECCV 2016] Experiment on Amazon Mechanical Turk Repeat N times for N labels Time Accuracy Many-labels is better Few-labels is better [Miller PsychologyReview 1956] M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Many-labels is better Improving annotation time Consistency in the few-labels setting Ask same worker about the same actions for multiple videos => 13.6% reduction in annotation time ☐ Opens book ☐ Opens book ☐ Opens book Worker 1: vs ☐ Opens book ☐ Walks ☐ Sits down Worker 1: Play video at 2x speed [Lasecki UIST 2014] Semantic hierarchy of labels [Deng CHI 2014] M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Few-labels is better Improving recall Video summary Request a 20-word description of the video Many-labels ☐ Opens book => no effect on recall, 40% slower ☐ Puts book on shelf ☐ Walks ☐ Turns on stove ☐ Eats ☐ Sits down Forced response ☐ Sneezes ☐ Picks up a cup Request a yes/no response for every label ☐ Holds a dish … => actually drops recall! (annoys workers?) Consensus annotation Rely on multiple rounds of annotation with different workers => recall improves from 58.0% to 83.3% with 3 rounds [Krishna CHI 2016] M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Bringing it all together Data: 1,815 videos, each ~30 secs long, 2x speed Labels: 157 human actions, organized into a hierarchy with 52 high-level actions Charades dataset of [Sigurdsson ECCV 2016] Experiments on Amazon Mechanical Turk Label is positive if >= 1 worker marks it as positive 100 100 7 rounds 95 90 Many-label Precision interface (26) 90 80 Recall 1st 3 rounds 85 Few-label round 7 rounds 70 interface (5) 80 60 75 3 rounds 1st round 50 70 0 5 10 0 5 10 Average time to ann Average time to an Cumulative time [min] Cumulative time [min] M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Conclusions • Quantitative analysis of multi-label video annotation • Many-labels interface is better than the few-labels interface • Annotated of 157 human actions on 9,848 videos (incl. temporal extent) Download dataset at http://allenai.org/plato/charades Actions Video (3x speed) M UCH ADO ABOUT TIME : E XHAUSTIVE ANNOTATION OF TEMPORAL DATA HTTP :// ALLENAI . ORG / PLATO / CHARADES /
Recommend
More recommend