Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11
Dr. Florian Wilhelm Principal Data Scientist @ inovex Special Interests Florian Tanten • Mathematical Modelling Master Thesis @ inovex October 2017 - May 2018 @FlorianWilhelm • Recommendation Systems � FlorianWilhelm • Data Science in Production florianwilhelm.info • Python Data Stack • Maintainer of PyScaffold 2
IT-project house for digital transformation: inovex offices in Karlsruhe · Cologne · Munich · ‣ Agile Development & Management Pforzheim · Hamburg · Stuttgart. ‣ Web · UI/UX · Replatforming · Microservices ‣ Mobile · Apps · Smart Devices · Robotics www.inovex.de ‣ Big Data & Business Intelligence Platforms ‣ Data Science · Data Products · Search · Deep Learning Using technology to inspire our clients. And ourselves . ‣ Data Center Automation · DevOps · Cloud · Hosting ‣ Trainings & Coachings
Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 4
Vehicle Identification Number (VIN) flexible fuel vehicles serial number Unique identifier like a fingerprint of a vehicle manufacturer assembly plant model year security code country details 5 https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics
Use Case Spotting the vehicle identification number (VIN) in images of vehicle registration documents Information about the car: VIN-Decoder Manufacturer: BMW Model: X3 Year: 2013-03-21 VIN: Engine power: 143 PS WF0DXXGAKDEJ37385 Equipment: - Xenon Lights ... 6
OCR -Libraries Op Open so source to tools Co Commercial so software Py PyOCR 7
OCR with Tesseract „VSSZZZGJZHR03G533“ + ??? 8
Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 9
Methodology in Text Spotting CNN = Convolutional Neural Network SVM = Support Vector Machine HOG = Histogram of oriented Gradients RNN = Recurrent Neural Networks RL = Reinforcement Learning Character 379 Recognition Spot Sp otting = ng = De Detection + Re Recognit itio ion Character detection & extraction Character recognition SVM - Connected components Computer Vision - Stroke width transform Tools - Edge detection Nearest Neighbor Character or word - SVM - Learning with HOG CNN Sliding Window - CNN High-performer current studies CNN + RNN - Region proposal Others - Hypotheses CNN pooling ... Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“ 11
Convolutional Neural Network Convolution with 3x3 kernel and stride = 1 Max pooling with a 2x2 filter and stride = 2 12 https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/
Agenda 1. Use Case 2. Data and Pipeline 3. Semi-supervised Learning 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 14
Objectives - ~170 images of vehicle registration documents Dataset: Text 1. Implementation of a prototype „XLG0H200NA0A10348“ Spotting a) Supervised method 2. Comparison of classifiers b) Semi-supervised method 15
End-to-End Text Spotting Pipeline Region of Interest Extractor Im Image depi picting only VIN IN Sliding window Al All windows Character Detector (2 classes) All windows with characters Al Non Maximum Suppression On Only one window per character Chararacter Recognizer (36 classes) X L G 0 H 2 0 0 N A 0 4 1 0 3 4 8 16
Small Dataset What to do about that? 1. Data Generation 2. Data Augmentation 17
Data Augmentation Original image labeled manually as „0“ Chararacter Recognizer (36 classes) Character Detector (2 classes) Label: „character“ Label: „0“ Da Data augme mentation: Label: „no character“ Da Datase sets: s: 36 classes 2 classes 18
Datasets 170 170 images of vehicle registration documents 85 images 85 images 85 85 Training set Testing set Data Augmentation Data Augmentation Detector Recognizer Detector Recognizer ~ 42000 images ~ 8000 images ~ 42000 images ~ 8000 images 85 images 2 classes 36 classes 2 classes 36 classes Training sets of classifiers Testing sets of classifiers Testing sets of pipeline 19
Classifiers 1. Supervised Convolutional Neural Network Classification Input Feature extraction 2. Semi-supervised Generative Adversarial Network Generator Discriminator 20
Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 21
Yann LeCun Director of Facebook AI Research, Prof at NYU “... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“ Ian J. Goodfellow @ Google Brain 22
Generative Adversarial Network Generator (G) Discriminator (D) Generate images, which seem to Goal: Differentiate between fake and real Goal: be realistic images 23
Generative Adversarial Network Real labeled images Real images „yes“ A B . . Is D Discriminator (D) . correct? 8 9 F Generator (G) „D classified the generated image as 10% real“ 24
Mathematical formulation Ob Objective fu function Discriminator calculates likelihood [0,1] for an image being real Discriminator output Discriminator output for real images for fake images Tr Training (al alternat nating ng) Maximizing discriminator loss Minimizing generator loss 25 Goodfellow et al. (2014), Generative Adversarial Networks
Example of generated images Training images: Generated images during learning process: 26
Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 27
Semi-supervised Learning Makes use of • unlabeled data Unsupervised Supervised Learning Learning Combines supervised • and unsupervised learning Semi-supervised Learning 28
Semi-supervised GAN for Character Detection Real labeled images Real unlabeled Discriminator images Generator 29
Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 30
Character Detector (2 classes) Pr Pretrai aining of of D DCNN 100,00% Manually generated images with CAPTCHA methods 90,00% Accuracy 80,00% „Character“ „No character“ 70,00% 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 Size of labeled training set Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained 31
Character Detector (2 classes) Supervi Sup vised G GAN 100,00% C C 90,00% Accuracy Real labeled 80,00% images C C Discriminator F 70,00% Generator 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 F Size of labeled training set Bildschirmfoto 2018-04-24 um Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained Supervised GAN 17.48.20 32
Character Detector (2 classes) Semi-su Se supervise sed GA GAN 100,00% C C 90,00% Real labeled Accuracy images 80,00% C C Discriminator F Real unlabeled 70,00% images Generator 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 Size of labeled training set F Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained Supervised GAN Semi-supervised GAN 33
Character Recognizer (36 classes) Character Recogniz izer Character Detector Ch 100,00% 100,00% 90,00% 90,00% 80,00% Accuracy 80,00% 70,00% Accuracy 60,00% 70,00% 50,00% 60,00% 0 0 0 0 0 0 0 0 0 0 0 2 5 0 0 0 0 0 0 0 0 0 1 2 4 7 0 0 0 0 0 40,00% 1 5 5 0 2 1 3 4 Size of labeled training set 30,00% DCNN DCNN pretrained Supervised GAN 20,00% 10,00% 0,00% 36 72 108 200 300 400 600 800 1000 5000 8000 Size of labeled training set Bildschirmfoto 2018-04-24 um 17.48.20 34
End-to-End Text Spotting Pipeline 85 images Region of Interest Extractor Sliding window Character Detector (2 classes) Non Maximum Suppression Chararacter Recognizer (36 classes) 1. . 2. . Accuracy = 99.94% . 85. 35
Google Cloud Vision API vs. Our Approach . . . 85 images Region of Interest Extractor 85 images of VINs Sliding window Character Detector (2 classes) Google Cloud Vision API Non Maximum Suppression Chararacter Recognizer (36 classes) Levenshtein distance: ∅ Levenshtein distance = 4.49 Classification Label ∅ Levenshtein distance = 0.011 AYZ33 XYZ321 = 3 36
Key Learnings Custom solutions can tremendously outperform • off-the-shelve software in a specific use-case Semi-supervised GANs can be successfully • applied in use-cases with little data With simple data augmentation techniques • having only little data might be enough 37
Recommend
More recommend