where have we been where are we going
play

Where have we been? Where are we going? LI F E I F EI The - PowerPoint PPT Presentation

Where have we been? Where are we going? LI F E I F EI The Beginning: CVPR 2009 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, Im a g eNet: A La rg e-Sca le Hiera rchica l Im a g e Da ta b a se. IEEE Com puter Vision and


  1. Where have we been? Where are we going? LI F E I – F EI

  2. The Beginning: CVPR 2009 J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, Im a g eNet: A La rg e-Sca le Hiera rchica l Im a g e Da ta b a se. IEEE Com puter Vision and Pattern Recognition (CVPR), 2009.

  3. The Impact of

  4. on Google Scholar 4,38 6 Citations 2,8 47 Citations …and m any m ore.

  5. From Challenge Contestants to Startups

  6. A Revolution in Deep Learning W hy Deep Lea rning is Sud d enly Cha ng ing Your Life The Grea t Artificia l Intellig ence By Roger Parloff Aw a kening The d a ta tha t tra nsform ed AI By Gideon Lew is-Kraus resea rch—a nd p ossib ly the w orld By Dave Gershgorn

  7. “The of x ” SpaceNet MusicNet Medical Im ageNet DigitalGlobe, CosmiQ Works, NVIDIA J. Thickstun et al, 2017 Stanford Radiology, 2017 ShapeNet EventNet ActivityNet A.Chang et al, 2015 G. Ye et al, 2015 F. Heilbron et al, 2015

  8. An Explosion of Datasets 1627 276 1919 1MM 4 MM Hosted Datasets Commercial Student Data Scientists ML Models Competitions Competitions Submitted

  9. “ Datasets—not algorithm s—m ight be the key lim iting factor to developm ent of hum an-level artificial intelligence.” A L E X A N D E R W I S S N E R - G R O S S Edge.org, 2016

  10. The Untold History of

  11. Hardly the First Image Dataset Segm entation (20 0 1) CMU/ VASC Faces (19 9 8 ) FERET Faces (19 9 8 ) COIL Objects (19 9 6 ) MNIST digits (19 9 8 -10 ) D. Martin, C. Fowlkes, D. Tal, J. Malik. H. Rowley, S. Baluja, T. Kanade P. Phillips, H. Wechsler, J. S. Nene, S. Nayar, H. Murase Y LeCun & C. Cortes Huang, P. Raus KTH hum an action (20 0 4 ) Sign Language (20 0 8 ) UIUC Cars (20 0 4 ) 3D Textures (20 0 5) CuRRET Textures (19 9 9 ) I. Leptev & B. Caputo P. Buehler, M. Everingham, A. S. Agarwal, A. Awan, D. Roth S. Lazebnik, C. Schmid, J. Ponce K. Dana B. Van Ginneken S. Nayar Zisserman J. Koenderink ESP (20 0 6 ) CAVIAR Tracking (20 0 5) Middlebury Stereo (20 0 2) CalTech 10 1/ 256 (20 0 5) LabelMe (20 0 5) Ahn et al, 2006 R. Fisher, J. Santos-Victor J. Crowley D. Scharstein R. Szeliski Fei-Fei et al, 2004 Russell et al, 2005 GriffIn et al, 2007 Lotus Hill TinyIm age (20 0 8 ) PASCAL (20 0 7) MSRC (20 0 6 ) (20 0 7) Torralba et al. 2008 Everingham et al, 2009 Shotton et al. 2006 Yao et al, 2007

  12. A Profound Machine Learning Problem Within Visual Learning

  13. Machine Learning 101: Complexity, Generalization, Overfitting Error Underfitting Overfitting Zone Zone Generalization Error Training Generalization Error Gap Optim al Capacity Capacity

  14. One-Shot Learning Fei-Fei et al, 2003, 2004

  15. Fei-Fei et al, 2003, 2004

  16. How Children Learn to See

  17. Error Underfitting Overfitting Zone Zone Generalization Error Training Generalization Error Gap Optim al Capacity Capacity

  18. A new way of thinking… To shift the focus of Machine Learning for visual recognition from … to data. modeling… Lots of data.

  19. Internet Data Growth 1990-2010 15,000 11,250 7,500 3,750 Global Data Traffic (PB/ month) Source: Cisco

  20. What is WordNet? Establishes Organizes over ontological and 150,000 words into lexical relationships 117,000 categories Original paper by in NLP and related called synsets . [George Miller, et tasks. al 1990 ] cited over 5,000 times

  21. Christiane Fellbaum Senior Research Scholar Computer Science Department, Princeton President, Global WordNet Consortium

  22. Ind iv id ua lly Illustra ted W ord Net Nod es jacket: a short coat Germ an shepherd: breed of A m a ssiv e ontology of large shepherd dogs used in police work and as a guide for the im a ges to tra nsform blind. com p uter v ision m icrowave: kitchen appliance that cooks food by passing an electromagnetic wave through it. m ountain: a land mass that projects well above its surroundings; higher than a hill.

  23. Comrades Prof. Kai Li Jia Deng 1 st Ph.D. student Princeton Princeton

  24. Entity Step 1: Ontological Ma m m a l structure based on WordNet Dog Germ a n Shep herd

  25. Dog Germ a n Step 2: Populate categories Shep herd with thousands of images from the Internet

  26. Dog Germ a n Step 3: Clean results by Shep herd hand

  27. Three Attempts at Launching

  28. 1 st Attempt: The Psychophysics Experiment Im ageNet PhD Students Miserable Undergrads

  29. 1 st Attempt: The Psychophysics Experiment # of synsets: 40 ,0 0 0 (subject to: imageability analysis) • # of candidate images to label per synset: 10 ,0 0 0 • # of people needed to verify: 2-5 • Speed of human labeling: 2 im ages/ sec (one fixation: ~200msec) • Massive parallelism (N ~ 10 ^2-3) • ≈ 19 years 40 ,0 0 0 × 10 ,0 0 0 × 3 / 2 = 60 0 0 ,0 0 0 ,0 0 0 sec N

  30. 2 nd Attempt: Human-in-the-Loop Solutions

  31. 2 nd Attempt: Human-in-the-Loop Solutions Machine-generated Human-generated datasets can only match datasets transcend the best algorithms of algorithmic limitations, the time. leading to better machine perception.

  32. 3 rd Attempt: A Godsend Emerges Im ageNet PhD Students Crowdsourced Labor 4 9 k Workers from 16 7 Countries 20 0 7-20 10

  33. The Result: Goes Live in 2009

  34. What We Did Right

  35. While Others Targeted Detail… LabelMe Lotus Hill Per-Object Regions and Labels Hand-Traced Parse Trees Russell et al, 2005 Yao et al, 2007

  36. … We Targeted Scale SUN, 131K [Xiao et al. ‘10] LabelMe, 37K [Russell et al. ’07] 15M [Deng et al. ’09] PASCAL VOC, 30K [Everingham et al. ’06-’12] Caltech10 1, 9K [Fei-Fei, Fergus, Perona, ‘03]

  37. Additional Goals Carnivore - Canine - Dog - Working Dog - Husky High High-Quality Free of Resolution Annotation Charge To better replicate human visual To create a benchmarking dataset To ensure immediate application and acuity and advance the state of machine a sense of community perception, not merely reflect it

  38. An Emphasis on Community and Achievement Large Scale Visual Recognition Challenge (ILSVRC 20 10 -20 17)

  39. ILSVRC Contributors Alex Berg Jia Deng Zhiheng Huang Aditya Khosla Jonathan Krause Fei-Fei Li UNC Chapel Hill Univ. of Michigan Stanford Stanford Stanford Stanford Sean Ma Eunbyung Park Olga Russakovsky Sanjeev Satheesh Hao Su Wei Liu UNC Chapel Hill Stanford UNC Chapel Hill Stanford Stanford Stanford

  40. Our Inspiration: PASCAL VOC 2005-2012

  41. Our Inspiration: PASCAL VOC Mark Everingham Prize @ ECCV 20 16 Mark Everingham 1973-2012 Alex Berg, Jia Deng, Fei-Fei Li, Wei Liu, Olga Russakovsky

  42. Participation and Performance 172 157 123 8 1 35 29 2010 2011 2012 2013 2014 2015 2016 Num ber of Entries

  43. Participation and Performance 0 .28 172 157 123 8 1 0 .0 3 35 29 2010 2011 2012 2013 2014 2015 2016 Num ber of Classification Entries Errors (top-5)

  44. Participation and Performance 0 .28 0 .66 172 157 123 8 1 0 .0 3 0 .23 35 29 2010 2011 2012 2013 2014 2015 2016 Average Precision Num ber of Classification Entries Errors (top-5) For Object Detection

  45. What we did to make better

  46. Lack of Details

  47. Lack of Details… ILSVRC Detection Challenge PASCAL ILSVRC Statistics VOC 20 12 20 13 Object classes 20 20 0 10 x Images 5.7K 395K 70 x Training Objects 13.6K 345K 25x

  48. Evaluation of ILSVRC Detection Need to annotate the presence of all classes (to penalize false detections) # images: 400K Table Chair Horse Dog Cat Bird # classes: 200 # annotations = 80M! + + - - - - + - - - + - + + - - - -

  49. Evaluation of ILSVRC Detection Hierarchical annotation J. Deng, O. Russakovsky, J. Krause, M. Bernstein, A. Berg, & L. Fei-Fei. CHI, 2014

  50. What does classifying 10K+ classes tell us? J. Deng, A. Berg & L. Fei-Fei, ECCV, 2010

  51. Fine-Grained Recognition “Ca rd iga n W elsh Corgi” “Pem broke W elsh Corgi”

  52. Fine-Grained Recognition cars [Gebru, Krause, Deng, Fei-Fei, CHI 2017] 2567 classes 700k images

  53. Expected Outcomes Machine learning Breakthroughs in ImageNet becomes a advances and changes object recognition benchmark dramatically

  54. Unexpected Outcomes

  55. Neural Nets are Cool Again! 13,259 Citations Krizhevsky, Sutskever & Hinton, NIPS 2012

  56. And Cooler and Cooler  … “ResNet” “AlexNet” “GoogLeNet” “VGG Net” [Simonyan & Zisserman, [He et al. CVPR 2016] [Krizhevsky et al. NIPS 2012] [Szegedy et al. CVPR 2015] ICLR 2015]

  57. Neural Nets A Deep Learning Revolution GPUs

  58. Ontological Structure Structure Not Used as Much

  59. Thing is a Animalia Chordate Arthropoda Mammal Insect W om ba t is a Primate Carnivora Diptera Marsupial Hominidae Pongidae Felidae Muscidae Homo Pan Felis Musca is a Sapiens Troglodytes Domestica Leo Domestica Wombat Human Chimpanzee House Cat Lion Housefly Deng, Krause, Berg & Fei-Fei, CVPR 2012

Recommend


More recommend