Going Deeper with Convolutions Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich PRESENTED BY: KAYLEE YUHAS AND KYLE COFFEY
About Neural Networks • Neural networks can be used in many different capacities, often by capitalizing on their shared skills with AIs: • Object classification, such as with images - Given images of 2 different wolves, can identify subspecies • Speech recognition • Through interactive mediums such as video games, identify how people respond to different stimuli in various environments and situations • This work requires a hefty amount of resources to run smoothly • Traditional neural network architecture has remained mostly constant
How to improve on traditional neural network setups? • Increasing the performance of a neural network by increasing its size, while seemingly logically sound, has severe drawbacks: • Increased number of parameters makes the network prone to overfitting • Larger network size requires more computational resources l Green line: overfitting l
How to improve on traditional neural network setups? • Introducing sparsity into the architecture by replacing fully connected layers with sparse ones, even inside convolutions, is key. • Mimics biological systems • How to improve performance without more hardware? • By utilizing computations on dense matrices • This sparse architecture’s name is Inception, based on the 2010 film of the same name
Inception Architecture: Naïve Version In short: Inputs come from the previous layer, and go through various convolutional layers. The pooling layer serves to control overfitting by reducing spatial size. • The paper’s authors determined this was the optimal spatial spread, “the decision based more on convenience than necessity” l This can be repeated spatially for scaling l This alignment also avoids patch-alignment issues • However, 5x5 modules quickly become prohibitively expensive on convolutional layers with a large number of filters
Inception Architecture: Dimensionality Reduction • By computing reductions with 1x1 convolutions before reaching the more expensive 3x3 and 5x5 convolutions, the necessary processing power is tremendously reduced l The use of dimensionality reductions allows for significant increases in the number of units at each stage without having a sharp increase in necessary computational resources at later, more complex stages
GoogLeNet • An iteration of Inception the paper’s authors used as their submission to the 2014 ImageNet Large Scale Visual Recognition Competition (ILSVRC). • The network was designed to be so efficient it could run with a low memory footprint on individual devices that have limited computational resources. l If CNNs are to gain a foothold in private industry, having low overhead costs is especially important. Here is a small sample of the architecture of GoogLeNet, where you can note the usage of dimensionality reduction as opposed to the naïve.
GoogLeNet • Because the entirety of the architecture is far too large to fit legibly in one slide.
GoogLeNet • GoogLeNet incarnation of the Inception architecture. • “#3x3/#5x5 reduce” stands for the number of 1x1 filters in the reduction layer used before 3x3 and 5x5 convolutions. • While there are many layers to this, the main goal of it is to have the final “softmax” layers give “scores” to the image classes. • i.e. dogs, skin diseases, etc. • Loss function determines how good or bad each score is.
GoogLeNet • GoogLeNet was 22 layers deep, when counting only layers with parameters. l 27 if you count pooling l About 100 total layers • Could be trained to convergence with a few high-end GPUs in about a week l The main limitation would be memory usage • It was trained to classify images of into one of over 1000 leaf-node image categories in the ImageNet hierarchy l ImageNet is a large visual database designed specifically for visual software recognition research l GoogLeNet performed quite well in this contest
GoogLeNet • GoogLeNet was 22 layers deep, when counting only layers with parameters: 27 if you count pooling, with about 100 layers in total. • Left: GoogLeNet’s performance at the 2014 ILSVRC: it came in first place. • Right: A breakdown of its classification performance breakdown. l Using multiple different CNNs and averaging their scores to get a prediction class for an image results in better scores than just 1 CNN. See: the instance with 7 CNNs.
Summary • Convolutional neural networks are still top performers in neural networks. • The Inception framework allows for large scaling while minimizing processing bottlenecks, as well as “choke points” where if it scales to a certain point, it becomes inefficient. l It also runs well on machines without powerful hardware. • Reducing with using 1x1 convolutions before passing it to 3x3 and 5x5 convolutions has proven efficient and effective. • Further study: is mimicking the actual biological conditions universally the best case for neural network architecture? Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going deeper with convolutions. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi:10.1109/cvpr.2015.7298594 . Chabacano. (2008, February). Overfitting. Retrieved April 08, 2017, from https://en.wikipedia.org/wiki/Overfitting
Recommend
More recommend