Convolutional Neural Nets EECS 442 β David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/
Previously β Backpropagation π π¦ = βπ¦ + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3 1 2x β 6 β2π¦ + 6 β2π¦ + 6 Forward pass: compute function Backward pass: compute derivative of all parts of the function
Setting Up A Neural Net Input Hidden Output h 1 y 1 x 1 h 2 y 2 x 2 h 3 y 3 h 4
Setting Up A Neural Net Input Hidden 2 Output Hidden 1 a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4
Fully Connected Network a 1 h 1 y 1 Each neuron connects x 1 a 2 h 2 to each neuron in the y 2 previous layer x 2 a 3 h 3 y 3 a 4 h 4
Fully Connected Network a 1 h 1 y 1 π All layer a values x 1 a 2 h 2 π π , π π Neuron i weights, bias y 2 x 2 a 3 h 3 π Activation function y 3 a 4 h 4 π = π(πΏπ + π) π β 1 π₯ 1 π 1 π 1 π ( ) π β 2 π₯ 2 π 2 π 2 = + π β 3 π₯ 3 π 3 π 3 π β 4 π₯ 4 π 4 π 4
Fully Connected Network Define New Block: βLinear Layerβ (Ok technically itβs Affine) W b π π = πΏπ + π n L Can get gradient with respect to all the inputs (do on your own; useful trick: have to be able to do matrix multiply)
Fully Connected Network a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4 W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)
Fully Connected Network a 1 h 1 y 1 Backpropagation lets us calculate derivative of x 1 a 2 h 2 the output/error with respect to all the Ws at a y 2 x 2 a 3 h 3 given point x y 3 a 4 h 4 W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)
Putting It All Together β 1 Function: NN(x; W i ,b i ) Parameterized by W = {W i ,b i } W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)
Putting It All Together Function: Loss(NN(x; W i ,b i ),y) Function: NN(x; W i ,b i ) W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n) Loss y
Putting It All Together W = initializeWeights() for i in range(numIterations): #sample a batch batch = random.subset(0,#datapoints,K) batchX, batchY = dataX[batch], dataY[batch] #compute gradient with batch gradW = backprop(Loss(NN(batchX,W),batchY)) #update W with gradient step W += -stepsize*gradW return W
What Can We Represent? h 1 h 2 y 1 h 3 h 4 π π = πΏπ + π W b x L f (n)
What Can We Represent β’ Recall: ax+by+z is β’ proportional to signed distance to line β’ equal to signed distance if you set it right β’ Generalization to N-D: hyperplane w T x +b
Can We Train a Network To Do It? - - + - + - + x 1 + y 1 x 2 + + - - + + - -
Can We Train a Network To Do It? - - + h 1 - + - + + x 1 h 2 y 1 x 2 h 3 + + - - + + h 4 - -
Can We Train a Network To Do It? - + - - + max( w 1 T x +b,0) - + + max( w 1 T x +b,0)+ + + - - + + - max(- w 1 T x +b,0) = - Distance to line - - + - + max(- w 1 T x +b,0) + - + defined by w 1 + + - - + + - - x 1 - + - - + max( w 2 T x +b,0) - + + max( w 2 T x +b,0)+ x 2 + + - - + + - max(- w 2 T x +b,0) = - Distance to line - - + - + max(- w 2 T x +b,0) defined by w 2 + - + + + - - + + - -
Can We Train a Network To Do It? - + - - + - + + + + - - + + - - Distance to w 1 - - + - + + - + + + - - + + - - x 1 Next layer computes: - w 1 Distance - w 2 Distance > 0 + - - + - + + x 2 + + - - + + - - Distance to w 2 - - + - + + - + + + - - + + - -
Can We Train a Network To Do It? Result: feedforward neural networks with a finite number of neurons in a hidden layer can approximate any reasonable* function Cybenko (1989) for neural networks with sigmoids; Hornik (1991) more generally In practice, doesnβt give a practical guarantee. Why? *Continuous, with bounded domain.
Developing Intuitions There is no royal road to geometry. β Euclid β’ Best way: play with data, be skeptical of everything you do, be skeptical of everything you are told β’ Remember: this is linear algebra, not magic β’ Common technique: How would you set the weights by hand if you were forced to be a deep net
Parameters How many parameters does this network have? x 1 Weights : 1x2 y 1 Parameters: 3 (bias!) x 2
Parameters How many parameters does this network have? h 1 x 1 h 2 Weights : 1x4+4x2 = 12 y 1 Parameters: 12+5 = 17 x 2 h 3 h 4
Parameters How many parameters does this network have? a 1 h 1 y 1 Weights : 3x4+4x4+4x2 = 36 x 1 a 2 h 2 y 2 Parameters: 36+11 = 47 x 2 a 3 h 3 y 3 a 4 h 4
Parameters H*P+ H*H H*H O*H H +H +H +O O H H H neurons neurons neurons neurons Make Px1 h h h o vector β¦ β¦ β¦ β¦ x h h h o P: 285x350 picture (terrible!) , H: 1000, O: 3 102 million parameters (400MB)
Parameters β’ First layer converts all H visual information into a neurons Make single N dimensional Px1 h vector. vector β’ Suppose you want a β¦ x neuron to represent dx/dy h at each pixel. How many neurons do you need? β’ 2P!
Parameters H*P+ H*H H*H O*H H +H +H +O O H H H neurons neurons neurons neurons Make Px1 h h h o vector β¦ β¦ β¦ β¦ x h h h o P: 285x350, H: 2P, O: 3 100 billion parameters (400GB)
Convnets Keep Spatial Resolution Around Neural net: Convnet: Data: vector Fx1 Data: image HxWxF Transform: matrix-multiply Transform: convolution Make Keep Px1 Image vector Dims x
Convnet Height: 300 Width: 500 Height Depth: 3 Height: 32 Width Width: 32 Depth: 3 Depth
Convnet Fully connected: Convnet: Connects to everything Connects locally 32 32 neuron neuron 32 32 3 3 Slide credit: Karpathy and Fei-Fei
Convnet Neuron is the same: weighted linear average F w 32 F h neuron c πΊ β πΊ π π₯ ΰ· ΰ· ΰ· πΊ π,π,π β π½ π§+π,π¦+π,π 32 π=1 π=1 π=1 3 Slide credit: Karpathy and Fei-Fei
Convnet Neuron is the same: weighted linear average F w 32 Filter is global over Filter is local in F h neuron space: sum only channels/depth: sum over all channels over F h x F w pixels c πΊ β πΊ π π₯ ΰ· ΰ· ΰ· πΊ π,π,π β π½ π§+π,π¦+π,π 32 π=1 π=1 π=1 3 Slide credit: Karpathy and Fei-Fei
Convnet Get spatial output by sliding filter over image F w 32 F h c πΊ β πΊ π π₯ ΰ· ΰ· ΰ· πΊ π,π,π β π½ π§+π,π¦+π,π 32 π=1 π=1 π=1 3 Slide credit: Karpathy and Fei-Fei
Differences From Lecture 4 Filtering (a) #input channels can be greater than one (b) forget you learned the difference between convolution and cross-correlation I11 F11 I12 F12 I13 F13 I14 I15 I16 Output[1,2] I21 F21 I22 F22 I23 F23 I24 I25 I26 = I[1,2]*F[1,1] + I[1,3]*F[1,2] I31 F31 I32 F32 I33 F33 I34 I35 I36 + β¦ + I[3,4]*F[3,3] I41 I42 I43 I44 I45 I46 I51 I52 I53 I54 I55 I56
Convnet How big is the output? Height? 32-5+1=28 32 5 Width? 32-5+1=28 Channels? 1 5 One filter not very useful by itself 32 3 Slide credit: Karpathy and Fei-Fei
Multiple Filters Youβve already seen this before Input: Output: 400x600x1 400x600x2
Convnet Multiple out channels via multiple filters. How big is the output? Depth Height? 32-5+1=28 32 5 Dimension Width? 32-5+1=28 Channels? 200 5 200 32 3 Slide credit: Karpathy and Fei-Fei
Convnet Multiple out channels via multiple filters. How big is the output? Height? 32-5+1=28 32 5 Width? 32-5+1=28 Channels? 200 5 32 3 Slide credit: Karpathy and Fei-Fei
Convnet, Summarized Neural net: Convnet: series of matrix-multiplies series of convolutions parameterized by W , b + parameterized by F,b + nonlinearity/activation nonlinearity/activation Fit by gradient descent Fit by gradient descent x
One Additional Subtlety β Stride Warmup: how big is the output spatially? Normal (Stride 1): I11 I12 I13 I14 I15 I16 I17 F11 F12 F13 5x5 output F21 I21 F22 I22 F23 I23 I24 I25 I26 I27 F31 I31 F32 I32 F33 I33 I34 I35 I36 I37 I41 I42 I43 I44 I45 I46 I47 I51 I52 I53 I54 I55 I56 I57 I61 I62 I63 I64 I65 I66 I67 I71 I72 I73 I74 I75 I76 I77 Example credit: Karpathy and Fei-Fei
One Additional Subtlety β Stride Stride: skip a few (here 2) Normal (Stride 1): I11 I12 I13 I14 I15 I16 I17 F11 F12 F13 5x5 output F21 I21 F22 I22 F23 I23 I24 I25 I26 I27 F31 I31 F32 I32 F33 I33 I34 I35 I36 I37 I41 I42 I43 I44 I45 I46 I47 I51 I52 I53 I54 I55 I56 I57 I61 I62 I63 I64 I65 I66 I67 I71 I72 I73 I74 I75 I76 I77 Example credit: Karpathy and Fei-Fei
One Additional Subtlety β Stride Stride: skip a few (here 2) Normal (Stride 1): I11 I12 I13 I14 I15 I16 I17 F11 F12 F13 5x5 output I21 I22 F21 I23 F22 I24 F23 I25 I26 I27 I31 I32 F31 I33 F32 I34 F33 I35 I36 I37 I41 I42 I43 I44 I45 I46 I47 I51 I52 I53 I54 I55 I56 I57 I61 I62 I63 I64 I65 I66 I67 I71 I72 I73 I74 I75 I76 I77 Example credit: Karpathy and Fei-Fei
Recommend
More recommend