Scene Classification with Inception-7 Christian Szegedy with - PowerPoint PPT Presentation

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke

Julian Ibarz Vincent Vanhoucke

Task Classification of images into 10 different classes: ● Bedroom ● Bridge ● Church Outdoor ● Classroom ● Conference Room ● Dining Room ● Kitchen ● Living Room ● Restaurant ● Tower

Training/validation/test set Classification of images into 10 different classes: ● ~9.87 million training images ● 10 thousand test images ● 3 thousand validation images

Evolution of Inception 1 Inception 5 (GoogLeNet) Inception 7a 1 Going Deeper with Convolutions, [C. Szegedy et al, CVPR 2015]

Structural changes from Inception 5 to 6 Filter concatenation Filter concatenation 1x1 3x3 1x1 conv conv conv 3x3 3x3 5x5 1x1 1x1 conv conv conv 3x3 conv conv conv 1x1 1x1 3x3 1x1 1x1 3x3 conv conv Pooling conv conv Pooling Previous layer Previous layer From 5 to 6

3x3 convolution + 3x3 ReLu conv 5x5 3x3 convolution + conv ReLu 3x3 conv ● Each mini network has the same receptive field. ● Deeper: more expressive (ReLu on both layers). ● 25 / 18 times (~28%) cheaper (due to feature sharing). ● Computation savings can be used to increase the number of filters. Downside: Needs more memory at training time

Grid size reduction Inception 5 vs 6 Filter concatenation Pooling stride 2 1x1 3x3 conv conv 3x3 conv stride 2 3x3 5x5 stride 2 1x1 conv conv conv 3x3 conv 1x1 1x1 1x1 Pooling conv Pooling conv conv 1x1 stride 2 conv Much Previous layer Previous layer From 5 to 6 cheaper!

Structural changes from Inception 6 to 7 Filter concatenation Filter concatenation 3x1 + 1x3 1x1 3x3 1x1 conv conv conv conv 3x1 + 1x3 3x3 1x1 1x1 conv conv 3x1 + 1x3 conv 3x3 conv conv conv 1x1 1x1 3x3 1x1 1x1 3x3 conv conv Pooling conv conv Pooling Previous layer Previous layer From 6 to 7

1x3 convolution+ ReLu 3x1 conv 3x3 conv 3x1 convolution + ReLu 1x3 conv ● Each mini network has the same receptive field. ● Deeper: more expressive (ReLu on both layers). ● 9 / 6 times (~33%) cheaper (due to feature sharing). ● Computation savings can be used to increase the number of filters. Downside: Needs more memory at training time

Inception-6 vs Inception-7 Padding Inception 6: SAME padding throughout: SAME padding VALID padding Input grid Patch Stride Output Input grid Patch Stride Output size size grid size size size grid size 8x8 3x3 1 8x8 7x7 3x3 1 5x5 8x8 5x5 1 8x8 7x7 5x5 1 3x3 8x8 3x3 2 4x4 7x7 3x3 2 3x3 8x8 3x3 4 2x2 7x7 3x3 4 2x2 ● Output size is independent of patch size ● Output size depends on the patch size ● Padding with zero values ● No padding: each patch is fully contained

Inception-6 vs Inception-7 Padding Advantages of padding methods SAME padding VALID padding ● More equal distribution of gradients ● More refined: higher grid sizes at the ● Less boundary effects same computational cost ● No tunnel vision (sensitivity drop at the border) Stride Inception 6 padding Inception 7 padding 1 SAME SAME (VALID on first few layers) 2 SAME VALID

Inception-6 vs Inception-7 Padding Stride Inception 6 padding Inception 7 padding 1 SAME SAME (VALID on first few layers) 2 SAME VALID 224 112 56 28 14 7 Inception 6: Inception 7: 299 147 73 71 35 17 8 30% reduction of computation compared to a 299x299 network with SAME padding throughout.

Spending the computational savings Grid Size Inception 5 filters Inception 6 filters Inception 7 filters 28x28 (35x35 for Inception 7) 256 320 288 14x14 (17x17 for Inception 7) 528 576 1248 7x7 (8x8 for Inception 7) 1024 1024 2048 Note: filter size denotes the maximum number of filters/grid cell for each grid size. Typical number of filters is lower, especially for Inception 7.

LSUN specific modification ... ... 73x73 1x1 Conv (stride 2) 73x73 1x1 Conv (stride 2) 3x3 Max Pooling 73x73 (stride 2) 147x147 7x7 conv (stride 2) Accomodate low 73x73 7x7 conv (stride 2) resolution images and image patches 299x299 Input 151x151 Input

Training - Stochastic gradient descent - Momentum (0.9) - Fixed learning rate decay of 0.94 - Batch size: 32 - Random patches: - Minimum sample area: 15% of the full image - Minimum aspect ratio: 3:4 (affine distortion) - random constrast, brightness, hue and saturation - Batch normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, S . Ioffe, C.Szegedy, ICML 2015 )

Manual Score Calibration ● Compute weights for each label that maximizes the score on half of the validation set ● Cross-validation on the other half of the validation set ● Simplify weights after error-minimization to avoid overfitting to the validation set. Final score multipliers: ● 4.0 for church outdoor ● 2.0 for conference room Probable reason: classes are under-represented in the training set.

Evaluation - Crop averaging at 3 different scales ( Going Deeper with Convolutions, Szegedy et al, CVPR 2015 ): score averaging of 144 crops/image Evaluation method Accuracy (on validation set) Single crop 89.2% Multi crop 89.7% Manual score calibration 91.2%

Releasing Pretrained Inception and MultiBox Academic criticism: Results are hard to reproduce We will be releasing pretrained Caffe models for: ● GoogLeNet (Inception 5) ● BN-Inception (Inception 6) ● MultiBox-Inception proposal generator (based on Inception 6) Contact: Yangqing Jia

Acknowledgments We would like to thank: Organizers of LSUN DistBelief and Image Annotation teams at Google for their support for the machine learning and evaluation infrastructure.

Scene Classification with Inception-7 Christian Szegedy with - PowerPoint PPT Presentation

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke Julian Ibarz Vincent Vanhoucke Task Classification of images into 10 different classes: Bedroom Bridge Church Outdoor

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

INCEPTION REPORT PRESENTATION Inception report FARM INCOME ENHANCEMENTS AND FORESTRY CONSERVATION

Inception of the Nature Area Inception of the Nature Area 77 hectares of land located

Inception Graduation Dispersion Dispersion Inception Origins Filling a need PhD

Scene Understanding Introduction & Overview Outline Motivation The problems Scene

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Managing Street Scene Matthew Wakelam Assistant Director Street Scene Cardiff Council 1.

Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations

JavaFX Basics Scene Builder CS 2112 Lab 9: JavaFX JavaFX Basics Scene Builder CS 2112 Lab 9:

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le LeNet 60k parameters

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

GSM privacy attacks Karsten Nohl, nohl@srlabs.de Karsten Nohl, nohl@srlabs.de Agenda GSM

Deep learning 4.5. Pooling Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 The

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline What are

Scene Classification with Inception-7 Christian Szegedy with - PowerPoint PPT Presentation

Scene Classification with Inception-7 Christian Szegedy with Julian Ibarz and Vincent Vanhoucke Julian Ibarz Vincent Vanhoucke Task Classification of images into 10 different classes: Bedroom Bridge Church Outdoor

Scene Graphs Scene Representation How does one describe the objects in a 3D scene? Scene

Scene Representation How does one describe the objects in a Scene Graphs 3D scene? Scene

Episode 42: I Made Slides 10 February 2019 The Three-Act, Seven Scene Structure Act I:

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs &amp; hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --&gt; Scene Parsing Scene

Volumetric Scene Reconstruction Volumetric Scene Reconstruction Goal Goal from Multiple

INCEPTION REPORT PRESENTATION Inception report FARM INCOME ENHANCEMENTS AND FORESTRY CONSERVATION

Inception of the Nature Area Inception of the Nature Area 77 hectares of land located

Inception Graduation Dispersion Dispersion Inception Origins Filling a need PhD

Scene Understanding Introduction &amp; Overview Outline Motivation The problems Scene

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Managing Street Scene Matthew Wakelam Assistant Director Street Scene Cardiff Council 1.

Scene Represe sentation Networks: ks: Continuous 3D-Structure-Aware Neural Scene Representations

JavaFX Basics Scene Builder CS 2112 Lab 9: JavaFX JavaFX Basics Scene Builder CS 2112 Lab 9:

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Le Lecture 10 recap ap Prof. Leal-Taix and Prof. Niessner 1 Le LeNet 60k parameters

Learning Transferable Architectures for Scalable Image Recognition - Barret Zoph, Vijay Vasudevan,

GSM privacy attacks Karsten Nohl, nohl@srlabs.de Karsten Nohl, nohl@srlabs.de Agenda GSM

Deep learning 4.5. Pooling Fran cois Fleuret https://fleuret.org/ee559/ Nov 2, 2020 The

An Open Interface for Hooking Solvers to Modeling Systems Part 2: The Modeling System Interface

Congestion Management for Non-Blocking Clos Networks Nikos Chrysos Inst. of Computer Science

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

Convolutional Neural Networks Kaitlin Palmer San Diego State University 1 Outline What are

CMSC427 Scene graphs Credit: slides from Dr. Zwicker Today Scene graphs & hierarchies

a better and faster way Shu Kong CS, ICS, UCI Image Understanding --> Scene Parsing Scene

Scene Understanding Introduction & Overview Outline Motivation The problems Scene