 
              Learning Face Recognition from Limited Training Data using Deep Neural Networks Xi Peng * , Nalini Ratha ** , Sharathchandra Pankanti ** * Rutgers University, New Jersey, 08854 ** IBM T. J. Watson Research Center, New York 10598
Motivation Challenges of Face Recognition Occlusion Low resolution Pose Angelina Jolie Illumination Decoration Chronology Expression
Motivation Challenges of Face Verification: inter-class variations < intra-class variation Impostor pair (different person) Genuine pair (same person) 1 0 • Pose • Expression • Illumination • Occlusion 0 1 0 1
Motivation Limited Training Data Private Dataset Group Approach #subjects #images Accuracy on LFW SFC Facebook DeepFace/2014 4,030 4,400,000 97.35% WebFace Google FaceNet/2015 8,000,000 200,000,000 98.87% Public Dataset #subjects #images LFW 5,749 13,233 PubFig 200 58,797 DNNs are extremely data-thirsty. However, limited training data is available. FaceScrub 530 107,818 CACD 2,000 163,446 CASIA 10,575 494,414
Motivation Existing Solutions • Normalize viewpoints using 3D face models [Hassner et. al. CVPR 2015] • Sensitive to noise: incorrect landmark detection —> artifacts; • Computational expensive: external module —> inefficient;
Motivation Existing Solutions • Normalize viewpoints using 3D face models [Hassner et. al. CVPR 2015] • Sensitive to noise: incorrect landmark detection —> artifacts; • Computational expensive: external module —> inefficient; • Pose-aware networks [Masi et. al. CVPR 2016] Require multiple networks: • complicate system • training expensive • end-to-end training
Our Approach Overview 224x224x3 100x100x3 recognition face network identity affine alignment parameter network
Our Approach Alignment Network 224x224x3 100x100x3 recognition face network identity affine alignment parameter network
Our Approach Recognition Network 224x224x3 100x100x3 recognition face network identity affine alignment network parameter
Our Approach 224x224x3 Alignment Network Architecture conv 7x7x128 max pooling 2x2 conv 5x5x128 max pooling 2x2 conv 3x3x128 max pooling 2x2 fully connected 256 alignment fully connected 64 network 6-affine parameter parameter affine recognition network identity face
Our Approach 100x100x3 224x224x3 Alignment Network Architecture conv 7x7x128 max pooling 2x2 conv 5x5x128 max pooling 2x2 conv 3x3x128 max pooling 2x2 fully connected 256 alignment fully connected 64 network 6-affine parameter parameter affine recognition network identity face
Our Approach 100x100x3 convolution module 1 Recognition Network Architecture max pooling 2x2 convolution module 2 max pooling 2x2 inception module 3 max pooling 2x2 inception module 4 max pooling 2x2 alignment network inception module 5 parameter fully connected 256 affine contrastive loss softmax loss recognition network identity face
Our Approach conv 1a: 3x3 100x100x3 ReLU LRN conv 1b: 3x3 ReLU LRN Recognition Network: Convolution Module convolution module 1 max pooling 2x2 convolution module 2 conv 2a: 3x3 max pooling 2x2 ReLU LRN inception module 3 conv 2a: 3x3 max pooling 2x2 ReLU LRN inception module 4 max pooling 2x2 inception module 5 fully connected 256 contrastive loss softmax loss
Our Approach 100x100x3 convolution Recognition Network: Inception Module incept3a module 1 depth concat max pooling 2x2 incept3b convolution module 2 depth concat max pooling 2x2 inception module 3 incept4a max pooling 2x2 depth concat inception module 4 incept4b max pooling 2x2 depth concat inception module 5 fully connected 256 incept5a contrastive loss softmax loss depth concat incept5b depth concat
Our Approach 100x100x3 convolution Recognition Network: Inception Module incept3a module 1 depth concat max pooling 2x2 incept3b convolution conv 1x1 pool 3x3 module 2 depth concat max pooling 2x2 conv 5x5 conv 1x1 previous layer depth concat inception module 3 conv 3x3 conv 1x1 incept4a max pooling 2x2 conv 1x1 depth concat inception module 4 incept4b max pooling 2x2 depth concat inception module 5 fully connected 256 incept5a contrastive loss softmax loss depth concat incept5b depth concat
Our Approach Recognition Network: Training Loss softmax loss fully connected 256 max pooling 2x2 max pooling 2x2 max pooling 2x2 max pooling 2x2 100x100x3 convolution convolution inception inception inception module 1 module 2 module 3 module 4 module 5 contrastive loss identity label softmax loss contrastive loss fully connected 10575 fully connected 256 fully connected 256 all genuine pairs, top-k closest impostor pairs fully connected 256
Our Approach Training Strategy • Sufficient training images —> capable of end-to-end training;
Our Approach Training Strategy • Sufficient training images —> capable of end-to-end training; • Speed up the training using limited data —> two-step training:
Our Approach Training Strategy • Sufficient training images —> capable of end-to-end training; • Speed up the training using limited data —> two-step training: • Step 1, 100x100x3 2D aligned images —> pre-train the recognition network; 224x224x3 100x100x3 landmark detection landmark centers face detection 2D alignment mirroring
Our Approach Training Strategy • Sufficient training images —> capable of end-to-end training; • Speed up the training using limited data —> two-step training: • Step 1, 100x100x3 2D aligned images —> pre-train the recognition network; 224x224x3 100x100x3 landmark detection landmark centers face detection 2D alignment mirroring • Step 2, 224x224x3 unaligned images —> fine-tune both the alignment and recognition networks; recognition alignment loss network network
Experiments Comparison of Different Distance Metric
Experiments Comparison of Different Distance Metric • Cosine distance is more robust to scale and rotation than the other metrics
Experiments Comparison of Different Data Pre-processing • The external 3D alignment: • significant performance degradation on testing;
Experiments Comparison of Different Data Pre-processing • The external 3D alignment: • significant performance degradation on testing; • artifacts <— landmark localization fails in challenging cases;
Experiments Comparison of Different Data Pre-processing • The external 3D alignment: • significant performance degradation on testing; • artifacts <— landmark localization fails in challenging cases; • The proposed alignment network: • better generalization ability <— jointly optimized;
Experiments Comparison of Different Data Pre-processing • The external 3D alignment: • significant performance degradation on testing; • artifacts <— landmark localization fails in challenging cases; • The proposed alignment network: • better generalization ability <— jointly optimized; • 27 ms per image, 100x times faster <— end-to-end network;
Experiments Comparison with State-of-the-arts • Our training dataset is significantly smaller: • images: 0.4 million v.s. 200 million;
Experiments Comparison with State-of-the-arts • Our training dataset is significantly smaller: • images: 0.4 million v.s. 200 million; • subjects: 10,575 v.s. 8 million;
Experiments Comparison with State-of-the-arts • Our training dataset is significantly smaller: • images: 0.4 million v.s. 200 million; • subjects: 10,575 v.s. 8 million; • Strong performance: • comparable accuracy: 96.6% v.s. 97.3%; • more efficient: no external modules for alignment ;
Summary • An end-to-end framework to jointly learn alignment and recognition tasks; • Comparable verification accuracy using much less training images; • Highly efficient deploying in testing time;
Recommend
More recommend