an egocentric perspec ve on ac ve vision and visual
play

An Egocentric Perspec/ve on Ac/ve Vision and Visual Object Learning - PowerPoint PPT Presentation

An Egocentric Perspec/ve on Ac/ve Vision and Visual Object Learning in Toddlers S. Bambach, D. Crandall, L. Smith, C. Yu. ICDL 2017 Experiment presenters: Arjun, Ginevra Their Experiments Image source: paper Their Experiments Authors could


  1. An Egocentric Perspec/ve on Ac/ve Vision and Visual Object Learning in Toddlers S. Bambach, D. Crandall, L. Smith, C. Yu. ICDL 2017 Experiment presenters: Arjun, Ginevra

  2. Their Experiments Image source: paper

  3. Their Experiments Authors could not control training set Image source: paper

  4. Our Experiments • We generate images where – Labeled object occupies fixed percentage of view – Background objects do not move Image source: collages we made from Caltech 256 database

  5. Our Experiments • Simulate toddler bringing object to face – We control scale to measure its effect on tes/ng accuracy Image source: collages we made from Caltech 256 database

  6. Our Dataset • 5 classes, 3633 images • Collages – Construct ‘scenes of toys’ using Caltech-256 – 1 posi/ve image amongst many nega/ves – Simulate toddler perspec/ve Image source: Caltech 256 database

  7. Scene Genera/on • Scene dim: 224 x 224 – Scale largest image dim to 70 – Rotate randomly from -15° to 15° • 10 nega/ves – Select uniformly from Caltech-256 nega/ves – Placed randomly in within scene boundary • 1 posi/ve – Scale 0 (1x), 1 (1.5x), 2 (2x), 3 (3x) – Place randomly within scene boundary (at scale 1) • 2 scenes per training instance

  8. VGG 16 Image source, and source of some code used in the experiments: h]ps://www.cs.toronto.edu/~frossard/post/vgg16/

  9. VGG 16 for 5 classes Image source: h]ps://www.cs.toronto.edu/~frossard/post/vgg16/, modified by us

  10. Experiment Setup • Experiment 1 – Train on different scales, test on clean image • Experiment 2 – Train on different scales and clean, test on different scales Scale 0 Scale 1 Scale 2 Scale 3 Clean 10% of view 20% of view 30% of view 60% of view Image Image source: collages we made from Caltech 256 database

  11. Experiment Setup • Experiment 1 – Train on different scales, test on clean image • Experiment 2 – Train on different scales and clean, test on different scales Scale 0 Scale 1 Scale 2 Scale 3 Clean 10% of view 20% of view 30% of view 60% of view Image Image source: collages we made from Caltech 256 database

  12. Experiment 1 - objec/ve • Test effect of ‘bringing object to face’ for isolated classifica/on • Ques/ons to consider – Effect of viewing at mul/ple scales? – Single ideal scale or result of mul/ple scales? Image source: h]ps://en.wik/onary.org/wiki/ques/on_mark

  13. Experiment 1 - data Train0 Image source: collages we made from Caltech 256 database

  14. Experiment 1 - data Train1 Image source: collages we made from Caltech 256 database

  15. Experiment 1 - data Train2 Image source: collages we made from Caltech 256 database

  16. Experiment 1 - data Train3 Image source: collages we made from Caltech 256 database

  17. Experiment 1 - data Train3only Image source: collages we made from Caltech 256 database

  18. Experiment 1 - data Correct number of epochs to compensate for more training examples Image source: collages we made from Caltech 256 database

  19. Experiment 1 - data Test Image source: collages we made from Caltech 256 database

  20. Experiment 1 - results 1 0.9 0.8 Tes*ng accuracy on clean image 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Train0 Train1 Train2 Train3 Train3only Train Set

  21. Experiment 1 - results 1 0.9 0.8 Tes*ng accuracy on clean image 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Train0 Train1 Train2 Train3 Train3only Train Set

  22. Experiment 1 - results 1 0.9 0.8 Tes*ng accuracy on clean image 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Train0 Train1 Train2 Train3 Train3only Train Set Training on larger scale images only yields to best test accuracy.

  23. Experiment 1 - results • Images misclassified when network trained in low scales benefit from training in higher scales Misclassified aier train0, train1, train2 Correctly classified aier train3 and train3only (Category: bag) Image source: Caltech 256 database

  24. Experiment 1 - results • Images misclassified when network trained in low scales benefit from training in higher scales Misclassified aier train0, train1, train2, train3 Correctly classified only aier train3only (Category: plane) Image source: Caltech 256 database

  25. Experiment 1 - results • Images misclassified aier train3only were misclassified aier all other trainings Bag Plane Plane Image source: Caltech 256 database

  26. Experiment 1 - conclusions • Toddler’s data gives be]er training because object is closer, not because it is ‘brought to face’ • Significant jump in accuracy if object occupies >30% of view in training • Training images where object occupies <30% of view do more harm than good Image source: collages we made from Caltech 256 database

  27. Experiment Setup • Experiment 1 – Train on different scales, test on clean image • Experiment 2 – Train on different scales and clean, test on different scales Scale 0 Scale 1 Scale 2 Scale 3 Clean 10% of view 20% of view 30% of view 60% of view Image Image source: collages we made from Caltech 256 database

  28. Experiment 2 - objec/ve • Effect of ‘bringing to face’ for object-in-scene detec/on • Ques/ons to consider – Does ‘cleaning’ the scene decrease detec/on in clu]ered environment? Image source: h]ps://en.wik/onary.org/wiki/ques/on_mark

  29. Experiment 2 - data Train0 Image source: collages we made from Caltech 256 database

  30. Experiment 2 - data Train1 Image source: collages we made from Caltech 256 database

  31. Experiment 2 - data Train2 Image source: collages we made from Caltech 256 database

  32. Experiment 2 - data Train3 Image source: collages we made from Caltech 256 database

  33. Experiment 2 - data TrainClean Image source: collages we made from Caltech 256 database

  34. Experiment 2 - data Correct number of epochs to compensate for more training examples Image source: collages we made from Caltech 256 database

  35. Experiment 2 - data Test0 On different images compared to train sets Image source: collages we made from Caltech 256 database

  36. Experiment 2 - data Test1only On different images compared to train sets Image source: collages we made from Caltech 256 database

  37. Experiment 2 - data Test2only On different images compared to train sets Image source: collages we made from Caltech 256 database

  38. Experiment 2 - data Test3only On different images compared to train sets Image source: collages we made from Caltech 256 database

  39. Experiment 2 - results 1 0.9 0.8 0.7 0.6 Tes*ng accuracy Test0 0.5 Test1only Test2only 0.4 Test3only 0.3 0.2 0.1 0 Train0 Train1 Train2 Train3 TrainClean Train set

  40. Experiment 2 - results 1 0.9 0.8 0.7 0.6 Tes*ng accuracy Test0 0.5 Test1only Test2only 0.4 Test3only 0.3 0.2 0.1 0 Train0 Train1 Train2 Train3 TrainClean Train set

  41. Experiment 2 - results 1 0.9 0.8 0.7 0.6 Tes*ng accuracy Test0 0.5 Test1only Test2only 0.4 Test3only 0.3 0.2 0.1 0 Train0 Train1 Train2 Train3 TrainClean Train set

  42. Experiment 2 - results 1 0.9 0.8 0.7 0.6 Tes*ng accuracy Test0 0.5 Test1only Test2only 0.4 Test3only 0.3 0.2 0.1 0 Train0 Train1 Train2 Train3 TrainClean Train set Training by ‘bringing to face’ yields to best accuracy

  43. Experiment 2 - conclusions • Can learn more from different scales than from clean, as long as scale 3 is included • Learning from different scales gives be]er accuracies when tested on lower scales • Test on clean much be]er than test on scales Image source: collages we made from Caltech 256 database

  44. Conclusions • With our controlled datasets, we could verify that network learns be]er from larger scale • Tes/ng needs to be done on clean images, no ma]er which scales were used in training • Training on scales >30% gives more robustness when tes/ng on all scales • Training on scales <30% hurts accuracy

Recommend


More recommend