Where have we been? Where are we going? LI F E I F EI The - PowerPoint PPT Presentation

Where have we been? Where are we going? LI F E I – F EI

The Beginning: CVPR 2009 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, Im a g eNet: A La rg e-Sca le Hiera rchica l Im a g e Da ta b a se. IEEE Com puter Vision and Pattern Recognition (CVPR), 2009.

The Impact of

on Google Scholar 4,38 6 Citations 2,8 47 Citations …and m any m ore.

From Challenge Contestants to Startups

A Revolution in Deep Learning W hy Deep Lea rning is Sud d enly Cha ng ing Your Life The Grea t Artificia l Intellig ence By Roger Parloff Aw a kening The d a ta tha t tra nsform ed AI By Gideon Lew is-Kraus resea rch—a nd p ossib ly the w orld By Dave Gershgorn

“The of x ” SpaceNet MusicNet Medical Im ageNet DigitalGlobe, CosmiQ Works, NVIDIA J. Thickstun et al, 2017 Stanford Radiology, 2017 ShapeNet EventNet ActivityNet A.Chang et al, 2015 G. Ye et al, 2015 F. Heilbron et al, 2015

An Explosion of Datasets 1627 276 1919 1MM 4 MM Hosted Datasets Commercial Student Data Scientists ML Models Competitions Competitions Submitted

“ Datasets—not algorithm s—m ight be the key lim iting factor to developm ent of hum an-level artificial intelligence.” A L E X A N D E R W I S S N E R - G R O S S Edge.org, 2016

The Untold History of

Hardly the First Image Dataset Segm entation (20 0 1) CMU/ VASC Faces (19 9 8 ) FERET Faces (19 9 8 ) COIL Objects (19 9 6 ) MNIST digits (19 9 8 -10 ) D. Martin, C. Fowlkes, D. Tal, J. Malik. H. Rowley, S. Baluja, T. Kanade P. Phillips, H. Wechsler, J. S. Nene, S. Nayar, H. Murase Y LeCun & C. Cortes Huang, P. Raus KTH hum an action (20 0 4 ) Sign Language (20 0 8 ) UIUC Cars (20 0 4 ) 3D Textures (20 0 5) CuRRET Textures (19 9 9 ) I. Leptev & B. Caputo P. Buehler, M. Everingham, A. S. Agarwal, A. Awan, D. Roth S. Lazebnik, C. Schmid, J. Ponce K. Dana B. Van Ginneken S. Nayar Zisserman J. Koenderink ESP (20 0 6 ) CAVIAR Tracking (20 0 5) Middlebury Stereo (20 0 2) CalTech 10 1/ 256 (20 0 5) LabelMe (20 0 5) Ahn et al, 2006 R. Fisher, J. Santos-Victor J. Crowley D. Scharstein R. Szeliski Fei-Fei et al, 2004 Russell et al, 2005 GriffIn et al, 2007 Lotus Hill TinyIm age (20 0 8 ) PASCAL (20 0 7) MSRC (20 0 6 ) (20 0 7) Torralba et al. 2008 Everingham et al, 2009 Shotton et al. 2006 Yao et al, 2007

A Profound Machine Learning Problem Within Visual Learning

Machine Learning 101: Complexity, Generalization, Overfitting Error Underfitting Overfitting Zone Zone Generalization Error Training Generalization Error Gap Optim al Capacity Capacity

One-Shot Learning Fei-Fei et al, 2003, 2004

Fei-Fei et al, 2003, 2004

How Children Learn to See

Error Underfitting Overfitting Zone Zone Generalization Error Training Generalization Error Gap Optim al Capacity Capacity

A new way of thinking… To shift the focus of Machine Learning for visual recognition from … to data. modeling… Lots of data.

Internet Data Growth 1990-2010 15,000 11,250 7,500 3,750 Global Data Traffic (PB/ month) Source: Cisco

What is WordNet? Establishes Organizes over ontological and 150,000 words into lexical relationships 117,000 categories Original paper by in NLP and related called synsets . [George Miller, et tasks. al 1990 ] cited over 5,000 times

Christiane Fellbaum Senior Research Scholar Computer Science Department, Princeton President, Global WordNet Consortium

Ind iv id ua lly Illustra ted W ord Net Nod es jacket: a short coat Germ an shepherd: breed of A m a ssiv e ontology of large shepherd dogs used in police work and as a guide for the im a ges to tra nsform blind. com p uter v ision m icrowave: kitchen appliance that cooks food by passing an electromagnetic wave through it. m ountain: a land mass that projects well above its surroundings; higher than a hill.

Comrades Prof. Kai Li Jia Deng 1 st Ph.D. student Princeton Princeton

Entity Step 1: Ontological Ma m m a l structure based on WordNet Dog Germ a n Shep herd

Dog Germ a n Step 2: Populate categories Shep herd with thousands of images from the Internet

Dog Germ a n Step 3: Clean results by Shep herd hand

Three Attempts at Launching

1 st Attempt: The Psychophysics Experiment Im ageNet PhD Students Miserable Undergrads

1 st Attempt: The Psychophysics Experiment # of synsets: 40 ,0 0 0 (subject to: imageability analysis) • # of candidate images to label per synset: 10 ,0 0 0 • # of people needed to verify: 2-5 • Speed of human labeling: 2 im ages/ sec (one fixation: ~200msec) • Massive parallelism (N ~ 10 ^2-3) • ≈ 19 years 40 ,0 0 0 × 10 ,0 0 0 × 3 / 2 = 60 0 0 ,0 0 0 ,0 0 0 sec N

2 nd Attempt: Human-in-the-Loop Solutions

2 nd Attempt: Human-in-the-Loop Solutions Machine-generated Human-generated datasets can only match datasets transcend the best algorithms of algorithmic limitations, the time. leading to better machine perception.

3 rd Attempt: A Godsend Emerges Im ageNet PhD Students Crowdsourced Labor 4 9 k Workers from 16 7 Countries 20 0 7-20 10

The Result: Goes Live in 2009

What We Did Right

While Others Targeted Detail… LabelMe Lotus Hill Per-Object Regions and Labels Hand-Traced Parse Trees Russell et al, 2005 Yao et al, 2007

… We Targeted Scale SUN, 131K [Xiao et al. ‘10] LabelMe, 37K [Russell et al. ’07] 15M [Deng et al. ’09] PASCAL VOC, 30K [Everingham et al. ’06-’12] Caltech10 1, 9K [Fei-Fei, Fergus, Perona, ‘03]

Additional Goals Carnivore - Canine - Dog - Working Dog - Husky High High-Quality Free of Resolution Annotation Charge To better replicate human visual To create a benchmarking dataset To ensure immediate application and acuity and advance the state of machine a sense of community perception, not merely reflect it

An Emphasis on Community and Achievement Large Scale Visual Recognition Challenge (ILSVRC 20 10 -20 17)

ILSVRC Contributors Alex Berg Jia Deng Zhiheng Huang Aditya Khosla Jonathan Krause Fei-Fei Li UNC Chapel Hill Univ. of Michigan Stanford Stanford Stanford Stanford Sean Ma Eunbyung Park Olga Russakovsky Sanjeev Satheesh Hao Su Wei Liu UNC Chapel Hill Stanford UNC Chapel Hill Stanford Stanford Stanford

Our Inspiration: PASCAL VOC 2005-2012

Our Inspiration: PASCAL VOC Mark Everingham Prize @ ECCV 20 16 Mark Everingham 1973-2012 Alex Berg, Jia Deng, Fei-Fei Li, Wei Liu, Olga Russakovsky

Participation and Performance 172 157 123 8 1 35 29 2010 2011 2012 2013 2014 2015 2016 Num ber of Entries

Participation and Performance 0 .28 172 157 123 8 1 0 .0 3 35 29 2010 2011 2012 2013 2014 2015 2016 Num ber of Classification Entries Errors (top-5)

Participation and Performance 0 .28 0 .66 172 157 123 8 1 0 .0 3 0 .23 35 29 2010 2011 2012 2013 2014 2015 2016 Average Precision Num ber of Classification Entries Errors (top-5) For Object Detection

What we did to make better

Lack of Details

Lack of Details… ILSVRC Detection Challenge PASCAL ILSVRC Statistics VOC 20 12 20 13 Object classes 20 20 0 10 x Images 5.7K 395K 70 x Training Objects 13.6K 345K 25x

Evaluation of ILSVRC Detection Need to annotate the presence of all classes (to penalize false detections) # images: 400K Table Chair Horse Dog Cat Bird # classes: 200 # annotations = 80M! + + - - - - + - - - + - + + - - - -

Evaluation of ILSVRC Detection Hierarchical annotation J. Deng, O. Russakovsky, J. Krause, M. Bernstein, A. Berg, & L. Fei-Fei. CHI, 2014

What does classifying 10K+ classes tell us? J. Deng, A. Berg & L. Fei-Fei, ECCV, 2010

Fine-Grained Recognition “Ca rd iga n W elsh Corgi” “Pem broke W elsh Corgi”

Fine-Grained Recognition cars [Gebru, Krause, Deng, Fei-Fei, CHI 2017] 2567 classes 700k images

Expected Outcomes Machine learning Breakthroughs in ImageNet becomes a advances and changes object recognition benchmark dramatically

Unexpected Outcomes

Neural Nets are Cool Again! 13,259 Citations Krizhevsky, Sutskever & Hinton, NIPS 2012

And Cooler and Cooler  … “ResNet” “AlexNet” “GoogLeNet” “VGG Net” [Simonyan & Zisserman, [He et al. CVPR 2016] [Krizhevsky et al. NIPS 2012] [Szegedy et al. CVPR 2015] ICLR 2015]

Neural Nets A Deep Learning Revolution GPUs

Ontological Structure Structure Not Used as Much

Thing is a Animalia Chordate Arthropoda Mammal Insect W om ba t is a Primate Carnivora Diptera Marsupial Hominidae Pongidae Felidae Muscidae Homo Pan Felis Musca is a Sapiens Troglodytes Domestica Leo Domestica Wombat Human Chimpanzee House Cat Lion Housefly Deng, Krause, Berg & Fei-Fei, CVPR 2012

Where have we been? Where are we going? LI F E I F EI The - PowerPoint PPT Presentation

Where have we been? Where are we going? LI F E I F EI The Beginning: CVPR 2009 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, Im a g eNet: A La rg e-Sca le Hiera rchica l Im a g e Da ta b a se. IEEE Com puter Vision and