googlenet
play

GoogLeNet Deeper than deeper Some slides are from Christian Szegedy - PowerPoint PPT Presentation

GoogLeNet Deeper than deeper Some slides are from Christian Szegedy GoogLeNet Convolution Pooling Softmax Other GoogLeNet vs Previous GoogLeN Convolution et Pooling Softmax Other Zeiler-Fergus Architecture (1 tower) Why is the deep


  1. GoogLeNet Deeper than deeper Some slides are from Christian Szegedy

  2. GoogLeNet Convolution Pooling Softmax Other

  3. GoogLeNet vs Previous GoogLeN Convolution et Pooling Softmax Other Zeiler-Fergus Architecture (1 tower)

  4. Why is the deep learning revolution arriving just now?

  5. Why is the deep learning revolution arriving just now?

  6. Why is the deep learning revolution arriving just now? Re ctified L inear U nit Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier networks In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume (Vol. 15, pp. 315-323).

  7. Theoretical breakthroughs Arora, S., Bhaskara, A., Ge, R., & Ma, T. Provable bounds for learning some deep representations . ICML 2014

  8. Hebbian Principle Cells that fire together, wire together Input

  9. Cluster according activation statistics Layer 1 Input

  10. Cluster according correlation statistics Layer 2 Layer 1 Input

  11. Cluster according correlation statistics Layer 3 Layer 2 Layer 1 Input

  12. In images, correlations tend to be local

  13. Cover very local clusters by 1x1 convolutions number of 1x1 filters

  14. Less spread out correlations number of 1x1 filters

  15. Cover more spread out clusters by 3x3 convolutions number of 1x1 filters 3x3

  16. Cover more spread out clusters by 5x5 convolutions number of 1x1 filters 3x3

  17. Cover more spread out clusters by 5x5 convolutions number of 1x1 filters 5x5 3x3

  18. A heterogeneous set of convolutions number of 1x1 filters 3x3 5x5

  19. Schematic view (naive version) number of 1x1 filters Filter concatenation 3x3 1x1 3x3 5x5 convolutions convolutions convolutions 5x5 Previous layer

  20. Naive idea Filter concatenation 1x1 3x3 5x5 convolutions convolutions convolutions Previous layer

  21. Naive idea ( does not work! ) Filter concatenation 1x1 3x3 5x5 3x3 max convolutions convolutions convolutions pooling Previous layer

  22. Inception module Filter concatenation 3x3 5x5 1x1 convolutions convolutions convolutions 1x1 convolutions 1x1 1x1 3x3 max convolutions convolutions pooling Previous layer

  23. Inception Convolution Why does it have so Pooling many layers??? Softmax Other

  24. Inception Convolution 9 Inception modules Pooling Softmax Other Network in a network in a network...

  25. 1024 Inception 832 832 512 512 512 480 256 480 Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules.

  26. 1024 Inception 832 832 512 512 512 480 256 480 Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. � Can remove fully connected layers on top completely

  27. 1024 Inception 832 832 512 512 512 480 256 480 Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. � Can remove fully connected layers on top completely � Number of parameters is reduced to 5 million �

  28. 1024 Inception 832 832 512 512 512 480 256 480 Width of inception modules ranges from 256 filters (in early modules) to 1024 in top inception modules. Computional cost is � increased by less than Can remove fully connected layers on top completely 2X compared to � Krizhevsky’s network. Number of parameters is reduced to 5 million (<1.5Bn operations/ � evaluation)

  29. Efficient Gradient Propatation • Shadow network can always provide good performance • Auxiliary classifier connected to intermediate layers

  30. Multiple Models and Crops Performance break

  31. Classification performance

  32. Where Are We Now

  33. Where Are We Now •It is very hard for hymn •Even if the number of choices is reduced to 1000

  34. Where Are We Now •It is very hard for hyman •Even if the number of choices is reduced to 1000 •It is time consuming •1 image per minute •Human performance •Without training: 13 - 15% error •With training: 5.1% •GoogLeNet: 6.7%

Recommend


More recommend