CSE 802 Spring 2017 Deep Learning Inci M. Baytas Michigan State University February 13-15, 2017 1
Deep Learning in Computer Vision Large-scale Video Classification with Convolutional Neural Networks, CVPR 2014 2
Deep Learning in Computer Vision Microsoft Deep Learning Semantic Image Segmentation 3
Deep Learning in Computer Vision NeuralTalk and Walk, recognition, text description of the image while walking. 4
Deep Learning in Robotics Self Driving Cars 5
Deep Learning in Robotics Deep Sensimotor Learning 6
Other Applications of Deep Learning ● Natural Language Processing (NLP) ● Speech recognition and machine translation Why Should We Be Impressed? ● Automated vision (e.g., object recognition) is challenging: different viewpoints, scales, occlusions, illumination, … ● Robotics (e.g., autonomous driving) in real life environments (constantly changing, new tasks without guidance, unexpected factors) is challenging ● NLP (e.g., understanding human conversations) is an extremely complex task: noise, context, partial sentences, different accent,.. 7
Why Is Deep Learning So Popular Now? • Better hardware • Bigger data • Regularization methods (dropout) • Variety of optimization methods • SGD • Adagrad • Adadelta • ADAM • RMS Prop 8
Criticism and Limitations of Deep Networks • Large amount of data required for training • High performance computing a necessity • Non-optimal method • Task specific • Lack of theoretical understanding 9
Common Deep Network Types Convolutional neural Feed forward networks networks Recurrent neural networks 10
Components of Deep Learning Loss functions ● Squared loss: (y - f( x )) 2 ● Logistic loss: log(1 + e -yf( x ) ) ● Hinge loss: (1 + yf( x )) + 2 ● Squared hinge loss: (1 + yf( x )) + Non-linear activation functions ● Linear ● Tanh ● Sigmoid ● Softmax ● ReLU 11
12
Components of Deep Learning Optimizers ● Gradient Descent ● Adagrad (Adaptive Gradient Algorithm) ● Adadelta (An Adaptive Learning Rate Method) ● ADAM (Adaptive Moment Estimation) ● RMSProp Regularization Methods ● L 2 norm ● L 1 norm ● Dataset Augmentation ● Noise robustness ● Early stopping ● Dropout [12] 13
Components of Deep Learning Number of iterations ● Less iterations: may underfitting ● More iterations: use a stopping criteria Step size ● Very large step size: may miss optimal point ● Very small step size: takes longer to converge Parameter Initialization ● Initializing with zeros ● Random initialization ● Xavier initialization 14
Components of Deep Learning Batch size ● Bigger batch size: might require less iterations ● Smaller batch size: will need more iterations Number of layers ● More layers (more depth): introducing more non-linearity, more complexity, more parameters ● Too many layers might cause overfitting. Number of hidden parameters ● Large number of hidden layer: more model complexity, can approximate a more complex classifier ● Too many parameters: overfitting, increased training time 15
Convolutional Neural Networks • Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers [1]. Convolution: • A linear operator • Cross-correlation with a flipped kernel. • Convolution in spatial domain corresponds to multiplication in frequency domain. 16
Convolutional Neural Networks (CNNs) •Feed forward networks that can extract topological features from images. •Can provide invariance to geometric distortions such as translation, scaling, and rotation. •Hierarchical and robust feature extraction was done before CNNs. • CNN is data-driven. • Parameters of filters are learned from the data instead of predefined values. • At each iteration, parameters are updated to minimize the loss. 17
Convolution Layer •Local (sparse) connectivity • Reduces memory requirements • Fewer operations •Parameter sharing • Same kernel used at every position of the input •How to choose the ● Equivariance filter size? property • Receptive field 18
Pooling Layer (Subsampling) • Convolution stage: • several convolutions in parallel to produce a set of linear activations • Followed by non-linear activation • Then pooling layer: • Invariance to small translations • Dealing with variable size inputs 19
Fully-Connected Layer • Maps the latent representation of input to output • Output: • One-hot representation of class label • Predicted response • Appropriate activation function, e.g., softmax for classification. 20
Feature Extraction with CNNs 21
Some Example CNN Architectures LeNet-5 [2] 22
Some Example CNN Architectures AlexNet (5 layers) 23
Some Example CNN Architectures VGG 16 [3] 24
GoogLeNet (22 layers) 25
Tricks to Improve CNN Performance •Data augmentation • Flipping (commonly used in face) • Translation • Rotation • Stretching •Normalizing, Whitening (less redundancy) •Cropping and alignment (for especially face) 26
Project • You will implement 11-layer CNN architecture proposed in [6] to extract features. 27
Project •You can use a deep learning library to implement the network. •Library will take care of convolution, pooling, dropout, and back propagation. •You need to define cost function and activation functions. •The activation function of the output layer is softmax since it is a classification problem. •You can use tensorflow. 28
HPCC •Data and evaluation protocol are on HPCC. •/mnt/research/CSE_802_SPR_17 •To connect HPCC: ssh msunetid@hpcc.msu.edu and msu email password •To run small examples use developer mode: ssh dev-intel14 •Try to log in to HPCC and check the course research space. •Try to use a python IDE (PyCharm). Debug your code and understand how tensorflow works (if you are not familiar with a deep learning library). 29
Casia Dataset (Cropped Images) •The database contains 494,414 images. •10,575 subjects in total •We provide cropped and original images under /mnt/research/CSE_802_SPR_17 30
Test Data and Evaluation Protocol ● Final evaluation on Labeled Faces in the Wild (LFW) database [7] with 13,233 images, 5,749 subjects. ● Evaluation protocol: ○ BLUFR protocol [8]; find under /mnt/research/CSE_80 2_SPR_17 31
References 1. http://www.deeplearningbook.org/ 2. http://yann.lecun.com/exdb/lenet/ 3. https://www.cs.toronto.edu/~frossard/post/vgg16/ 4. A. Krizhevsky, I. Sutskever and G. E. Hinton “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012: Neural Information Processing Systems, Lake Tahoe, Nevada 5. http://pubs.sciepub.com/ajme/2/7/9/ 6. Dong Yi, Zhen Lei, Shengcai Liao and Stan Z. Li. Learning Face Representation from Scratch, arXiv:1411.7923v1 [cs.CV], 2014. 7. http://vis-www.cs.umass.edu/lfw/ 8. http://www.cbsr.ia.ac.cn/users/scliao/projects/blufr/ 9. http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html 10. https://www.nist.gov/programs-projects/face-recognition-grand-challenge-frgc 11. Shengcai Liao, Zhen Lei, Dong Yi, Stan Z. Li, "A Benchmark Study of Large-scale Unconstrained Face Recognition." In IAPR/IEEE International Joint Conference on Biometrics, Sep. 29 - Oct. 2, Clearwater, Florida, USA, 2014. 12. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever and Ruslan Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting”, Journal of Machine Learning Research 15 (2014) 1929-1958. 32
Recommend
More recommend