Learning from Unlabeled Video Carl Vondrick Columbia University
Survivor Bias of Video Data Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014
Survivor Bias of Video Data Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014
Survivor Bias of Video Data Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014
Felix Warneken, Max Plank Institute
The Oops ! dataset
Oops! Predicting Unintentional Action CVPR 2020 oops.cs.columbia.edu Epstein, Chen, Vondrick. CVPR 2020.
Oops! Predicting Unintentional Action CVPR 2020 oops.cs.columbia.edu Epstein, Chen, Vondrick. CVPR 2020.
Learning from unlabeled video
Example Videos
Perceptual Clues 1) Predictability Ranzato 2014, Han 2019, … 2) Temporal Order Misra 2016, Wei 2018, …
3) Video speed as self-supervised clue Epstein, Chen, Vondrick. CVPR 2020.
Speed of Action Alters Perceptual Judgement
3) Video speed as self-supervised clue Epstein, Chen, Vondrick. CVPR 2020.
Visualizing Features Epstein, Chen, Vondrick. CVPR 2020.
Fit linear model to classify intentionality + - ++ - -
What’s missing? Environmental Unexpected Multi-agent Limited Skill Planning Error Single-agent Execution Error Limited Visibility Human Ours (self-supervised) Limited Knowledge Kinetics (supervised) 0 5 10 15 20 25 Error (lower is better)
oops.cs.columbia.edu Tuesday 10am PST Poster 93 Epstein, Chen, Vondrick. CVPR 2020.
Natural Synchronization Vision Speech
Ackee seems to be: • edible • white/yellow • washable • sticky • larger than cherry tomato “I’m going to go in with the actual ackee I rinsed off earlier”
Word Learning from Vision VisualBERT, VILBERT, VideoBERT, LXMERT, … “stir” Transformer stack … “I turn on the fire and then I [???] the pasta” Learn what Learn how to learn � “stir” means what “stir” means
Learning to Learn Words Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Transformers as Meta-Learners Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Transformers as Meta-Learners Implement with cross entropy loss Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Meta-Learning Episodes New Words Episode … Composition Episode Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Mode 1: Language Modeling Mode 2: Word Acquisition Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Language Modeling 75 18% drop 60 Seen 19% 45 Accuracy drop Seen New 30 New 15 Seen Composition New Composition 0 BERT pretrained BERT + vision Meta-Learned Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Language Modeling 75 11% 18% drop drop 60 Seen Seen 19% New 45 Accuracy drop Seen New 30 New 15 Seen Composition New Composition 0 BERT pretrained BERT + vision Meta-Learned Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Word Acquisition Training Set Test Example get avocado still taking skin off stir rice into pan avocado fish with a knife new word Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Word Acquisition Training Set Test Example open the wash plates switch off oven on oven close oven cupboard with rag the bottom right new word Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Novel word acquisition Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Visualizing Learned Process Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Visualizing Attention Green boxes impact green prediction the most Training Set cut cherry tomatoes put spoon close food container … … … Test chop sun-dried rinse container spoon container put spoon tomatoes tomatoes Suris, Epstein, Ji, Chang, Vondrick. arXiv.
expert.cs.columbia.edu Suris, Epstein, Ji, Chang, Vondrick. arXiv.
Learning from Unlabeled Video Carl Vondrick Columbia University
Recommend
More recommend