Convolutional Neural Networks QSB 2018: Learning and Artificial - PowerPoint PPT Presentation

Convolutional Neural Networks QSB 2018: Learning and Artificial intelligence – Tutorial session 3 Giulio Matteucci

Neural network architectures for computer vision tasks Images are high dimensional ! input x ∈ 𝑺 𝒐 where n = nx x ny x nc … is large! representing them with pixel intensity values nc number of parameters (weights) grows quadratically with resolution ny fully connected networks do not scale well to real world computer vision problems! Can we exploit our prior knowledge about the the visual nx world to design a better architecture for vision?

Start from two considerations about natural visual input …. natural images are made of sparse , local visual features are local 1 independent components … because … visual scenes are made of (often) repeated elements natural image statistics is (approximately) visual features can show up everywhere 2 stationary across visual space because … visual objects undergo identity preserving transformations (e.g. translation) Hyvärinen et al. “Natural Image Statistics”, 2009

neurons search for the pattern neurons as filters…. dot product measures similarity stored in their weighs in input “synapse” 𝑥 0 𝑦 0 “dendrite” 𝑥 0 𝑦 0 input “axon” “soma” 𝑔 ෍ 𝑥 𝑗 𝑦 𝑗 + 𝑐 𝑥 1 𝑦 1 𝑗 𝑔 ෍ 𝑥 𝑗 𝑦 𝑗 + 𝑐 output “axon” 𝑗 𝑥 2 𝑦 2 … when input vector is similar enough to weight vector: response = preferred feature is detected

1 learn small localized filters : … to do so let’s keep spatial structure (i.e. do not flatten input) fully connected units locally connected units h w  ✓ learning global filters for local features learning local filters for local features nx x ny parameters per hidden unit h x w parameters per hidden unit Eg: nx=ny=200 ---> 40000 Eg: h=w=4 ---> 16 parameters per unit parameters per unit Costly and inefficient! Cheap and efficient!

convolution operation reuse localized filters unaltered across different part of image 2 𝑳 𝑴 𝒛 𝒋,𝒌 = ෍ ෍ 𝒚 𝒋−𝒍,𝒌−𝒎 𝒙 𝒍,𝒎 𝒍=−𝑳 𝒎=−𝑴

nout filter output 2 1 2 nout = (nin – f) +1 applying convolution 1 0 -1 output naturally shrinks input 1 2 -1 1 -3 We can avoid this … f nin -2 2 1 2 1 adding 0s at the padded nout = (nin + 2p – f) +1 input border convolution 0 1 2 -1 1 -3 0 p p=0 | nin≠nout called: “ valid ” convolution p | nin=nout called: “ same ” convolution

receptive field (RF) when cascading multiple convolution operations useful to introduce: region of the input space from which a unit of interest filter given neuron receives information from 1 1 -1 1 -1 1 layer k=4 • 𝒈 𝟓 equal to filter size in first layer 1 1 1 -3 2 layer k=3 • grow as of f-1 each next layer layer k=2 2 3 -2 1 -2 2 -1 for the k th one, recursively: layer k=1 -2 2 1 2 -1 -2 2 1 1 𝒎 𝒍 = 𝒎 𝒍−𝟐 + 𝒈 𝒍 − 𝟐 input 0 1 2 -1 1 -3 1 2 -1 1 -3 𝒎 𝟐 = 𝒈 𝒍 with: 𝒎 𝟓

modern CNNs use very small filters (e.g. 3x3) to develop selectivity for meaningful pattern we need larger RF ! we may want to make them grow faster … s filter -2 1 1 change the strided “step” of filter 1 0 -1 convolution displacement 0 1 2 -1 1 -3 0 strided convolution also act as a downsampling 𝒍−𝟐 𝒕 𝒋 𝒎 𝒍 = 𝒎 𝒍−𝟐 + 𝒈 𝒍 − 𝟐 ς 𝒋=𝟐 in this way RF size can grow faster : greatly reducing output size s = stride …. considering stride (and padding) nout = (nin + 2p – f) +1 p = padding with the output size will be: 𝑻 f = filter dimension

we are learning multiple filters , acting on all input channels together… filter output will form a “ feature map ” convolutional layer stack different feature maps on the third dimension (as different channels): ncout size of output volume: nxout = (nxin + 2p – fx) +1 𝑻 nyout = (nyin + 2p – fy) +1 nyout 𝑻 ncout = nf with nf = number of filters s = stride nxout fx,fy= filters size p = padding nxin,nyin= input size Karpathy 2016

1 2 1 example of convolution of an edge detecting filter : Sobel filter 0 0 0 -1 -2 -1 from setosa.io

neuron connected to a small region of solving FCN bad scaling : 1 sparsity of connections input only ( localized receptive field ) learning convolutional whole input space tiled with RF re-using 2 parameter sharing filters to enforce … the same parameters ( feature maps ) reminiscent of how visual information is represented across the brain surface retinotopic maps localized feature detectors

thinking to ... we may want to hardwire some amount of translation tolerance in our network! 2 nonlinear blur and down convolutionally apply a pooling operation sampling “replacing” a “ max ” filter to the input subregions with their max value 𝒛 𝒋,𝒌 = 𝒏𝒃𝒚 𝒒𝒑𝒑𝒎 𝒋,𝒌 with 𝒒𝒑𝒑𝒎 𝒋,𝒌 = 𝒚 𝒋−𝒍,𝒌−𝒎 with k=1,..fy and l=1,..fx 1 1 2 4 max pool with 6 8 5 6 7 8 usually done with stride fx=fy=2 and s=2 nyin nyout s=fx=fy to have non- overlapping subregions 1 0 3 2 4 3 3 4 1 2 nxout nxin

pooling operation will be applied to convolutional layer volumes independently to each feature map … individual feature map dimension of output volume: nxout = (nxin + 2p – fy) +1 nyout = (nyin + 2p – fx) +1 𝑇 𝑇 but since usually p=0, s=2 and fx=fy=2 … ncout = nf nyout = nyin nxout = nxin 𝟑 𝟑 also for RF size calculation old formula still holds • less computationally expansive number of parameters reduced by 75% • less likely to overfit Karpathy 2016

underlie transformation tolerance build up observed max -like pooling computation through the primate shape processing stream … complex cell position tolerant oriented edge detector neuron a classical example … V1 simple & complex cells max pooling simple cells position selective oriented edge detector neuron

Lee et al. 2009 conv3 pool3 conv1 pool1 conv2 pool2 conv4 read out of task- relevant information output representation input image features noncategorical categorial … more and more abstract … local high level low level global trsf. sensitive combine simpler features to build more complex ones trsf. invariant

we can consider stacks of convolutional layers as visual feature extractors … Features learned in solving one supervised task can frequently be useful in different contexts. No need to learn every feature from scratch for new tasks ! re-use the first N-layers of a network with pre-trained weights (on different task ) transfer learning ... depends on how distant task domain involved are! … how far in depth push N? face recognition & face recognition & far domains close domains satellite image classification emotion recognition low N high N only low-level features in common common high-level features extends applicability of deep learning in the small data regime

imagine to start with a trained face recognition system now you want a car model recognition one high level features will be poorly trasferable (too domain specific): strip away last layers! conv1 pool1 conv2 pool2 conv3 pool3 conv4 softmax layer p(identity|face) input image (face) features

ontop of that stick some new conv you are left with a general purpose middle-level feature extractor layers and a new softmax output with training (much less) you will build new car-specific high-level features and a working classifier conv1 pool1 conv2 pool2 conv3 pool3 conv4 softmax layer p(model|car) input image (car) features

may be interpreted as reflecting the compositionality of the hierarchical structure of CNNs layers (and features) visual world (objects are made of parts and subpart etc…) reminiscent of anatomical and functional hierarchy of visual pathways: ventral stream V1 V2 V4 PIT CIT • response latency increase AIT • RF size increase • tuning complexity increase • transformation tolerance increase • linear decodability increase Huberman et al. 2011

this kind of hierarchical brain processing of visual shape information has been modelled throughout the years (80’, 90’) … ... from Fukushima’s Neocognitron to Poggio’s HMAX model C2 S2 S : shape selectivity build-up ( AND -like operations) C1 C : transformation tolerance build-up ( OR -like operations) S1 biologically-derived ideas instantiated by these models inspired the birth of modern CNNs architectures … Riesenhuber et al. 1999 ( Riesenhuber & Poggio 1999 )

first successful convnet (handwritten digit recognition) … first of which was Yan LeCun’s LeNet ( ‘98) • conv filter size 5x5 (p=0 ↔ “valid”, s=1) • first applying stack of conv and pool layers followed by fc ones • pooling filter size 2x2 (p=0, s=2 ) • shallow: 2 conv layers interleaved with pooling 𝟕𝟏 ∙ 𝟐𝟏 𝟒 parameters (small) • Ng. 2017

Convolutional Neural Networks QSB 2018: Learning and Artificial - PowerPoint PPT Presentation

Convolutional Neural Networks QSB 2018: Learning and Artificial intelligence Tutorial session 3 Giulio Matteucci Neural network architectures for computer vision tasks Images are high dimensional ! input x where n = nx x ny x nc

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes Peter Mathys

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

THE SCAO SYSTEMS ON THE LBT Credits: E. Sacchetti LUCI 1 LUCI 2 SCAO systems 2x systems (S.

CK2 for the identification of CK2 binding partners Anna Nickelsen *, Joachim Jose Institute of

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Overview What is Adversarial Attack? Why should we care? How does it work? Real

Seeking control in Modern Standard Arabic Tali Arad Greshler 1 , Livnat Herzig Sheinfux 1 , Nurit

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Circular causality in event structures Tiziana Cimoli Dip. Matematica e Informatica, Universit`

Tools for tracking impact Blog views (Wordpress) Podcast downloads Social media metrics Reviews

Convolutional Neural Networks QSB 2018: Learning and Artificial - PowerPoint PPT Presentation

Convolutional Neural Networks QSB 2018: Learning and Artificial intelligence Tutorial session 3 Giulio Matteucci Neural network architectures for computer vision tasks Images are high dimensional ! input x where n = nx x ny x nc

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Anytime Reliability of Systematic LDPC Motivation Convolutional Codes LDPC Convolutional Codes

Convolutional Autoencoder (CAE) Prof. Seungchul Lee Industrial AI Lab. Convolutional Autoencoder

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

ON TEGRA X1 ALAN WANG, NVIDIA Convolutional Neural Network optimization target Result

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Semantic Segmentation of the sekleton in bone scintigraphy images with convolutional neural

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

15-780 Graduate Artificial Intelligence: Convolutional and recurrent networks J. Zico Kolter

ECEN 5682 Theory and Practice of Error Control Codes Convolutional Codes Peter Mathys

Coding and decoding with convolutional codes. The Viterbi Algorithm. J.-M. Brossier 2008 J.-M.

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

THE SCAO SYSTEMS ON THE LBT Credits: E. Sacchetti LUCI 1 LUCI 2 SCAO systems 2x systems (S.

CK2 for the identification of CK2 binding partners Anna Nickelsen *, Joachim Jose Institute of

Adversarial Examples and Adversarial Training Ian Goodfellow, OpenAI Research Scientist

Overview What is Adversarial Attack? Why should we care? How does it work? Real

Seeking control in Modern Standard Arabic Tali Arad Greshler 1 , Livnat Herzig Sheinfux 1 , Nurit

Adversarial Training Attacks on Deep Networks and Generative Adversarial Networks Erkut Erdem

Circular causality in event structures Tiziana Cimoli Dip. Matematica e Informatica, Universit`

Tools for tracking impact Blog views (Wordpress) Podcast downloads Social media metrics Reviews

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing