performance evaluation of gans in a semi supervised ocr
play

Performance Evaluation of GANs in a semi-supervised OCR Use Case - PowerPoint PPT Presentation

Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11 Dr. Florian Wilhelm Principal Data Scientist @ inovex Special Interests Florian Tanten Mathematical Modelling Master Thesis @ inovex


  1. Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11

  2. Dr. Florian Wilhelm Principal Data Scientist @ inovex Special Interests Florian Tanten • Mathematical Modelling Master Thesis @ inovex October 2017 - May 2018 @FlorianWilhelm • Recommendation Systems � FlorianWilhelm • Data Science in Production florianwilhelm.info • Python Data Stack • Maintainer of PyScaffold 2

  3. IT-project house for digital transformation: inovex offices in Karlsruhe · Cologne · Munich · ‣ Agile Development & Management Pforzheim · Hamburg · Stuttgart. ‣ Web · UI/UX · Replatforming · Microservices ‣ Mobile · Apps · Smart Devices · Robotics www.inovex.de ‣ Big Data & Business Intelligence Platforms ‣ Data Science · Data Products · Search · Deep Learning Using technology to inspire our clients. And ourselves . ‣ Data Center Automation · DevOps · Cloud · Hosting ‣ Trainings & Coachings

  4. Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 4

  5. Vehicle Identification Number (VIN) flexible fuel vehicles serial number Unique identifier like a fingerprint of a vehicle manufacturer assembly plant model year security code country details 5 https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics

  6. Use Case Spotting the vehicle identification number (VIN) in images of vehicle registration documents Information about the car: VIN-Decoder Manufacturer: BMW Model: X3 Year: 2013-03-21 VIN: Engine power: 143 PS WF0DXXGAKDEJ37385 Equipment: - Xenon Lights ... 6

  7. OCR -Libraries Op Open so source to tools Co Commercial so software Py PyOCR 7

  8. OCR with Tesseract „VSSZZZGJZHR03G533“ + ??? 8

  9. Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 9

  10. Methodology in Text Spotting CNN = Convolutional Neural Network SVM = Support Vector Machine HOG = Histogram of oriented Gradients RNN = Recurrent Neural Networks RL = Reinforcement Learning Character 379 Recognition Spot Sp otting = ng = De Detection + Re Recognit itio ion Character detection & extraction Character recognition SVM - Connected components Computer Vision - Stroke width transform Tools - Edge detection Nearest Neighbor Character or word - SVM - Learning with HOG CNN Sliding Window - CNN High-performer current studies CNN + RNN - Region proposal Others - Hypotheses CNN pooling ... Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“ 11

  11. Convolutional Neural Network Convolution with 3x3 kernel and stride = 1 Max pooling with a 2x2 filter and stride = 2 12 https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/

  12. Agenda 1. Use Case 2. Data and Pipeline 3. Semi-supervised Learning 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 14

  13. Objectives - ~170 images of vehicle registration documents Dataset: Text 1. Implementation of a prototype „XLG0H200NA0A10348“ Spotting a) Supervised method 2. Comparison of classifiers b) Semi-supervised method 15

  14. End-to-End Text Spotting Pipeline Region of Interest Extractor Im Image depi picting only VIN IN Sliding window Al All windows Character Detector (2 classes) All windows with characters Al Non Maximum Suppression On Only one window per character Chararacter Recognizer (36 classes) X L G 0 H 2 0 0 N A 0 4 1 0 3 4 8 16

  15. Small Dataset What to do about that? 1. Data Generation 2. Data Augmentation 17

  16. Data Augmentation Original image labeled manually as „0“ Chararacter Recognizer (36 classes) Character Detector (2 classes) Label: „character“ Label: „0“ Da Data augme mentation: Label: „no character“ Da Datase sets: s: 36 classes 2 classes 18

  17. Datasets 170 170 images of vehicle registration documents 85 images 85 images 85 85 Training set Testing set Data Augmentation Data Augmentation Detector Recognizer Detector Recognizer ~ 42000 images ~ 8000 images ~ 42000 images ~ 8000 images 85 images 2 classes 36 classes 2 classes 36 classes Training sets of classifiers Testing sets of classifiers Testing sets of pipeline 19

  18. Classifiers 1. Supervised Convolutional Neural Network Classification Input Feature extraction 2. Semi-supervised Generative Adversarial Network Generator Discriminator 20

  19. Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 21

  20. Yann LeCun Director of Facebook AI Research, Prof at NYU “... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“ Ian J. Goodfellow @ Google Brain 22

  21. Generative Adversarial Network Generator (G) Discriminator (D) Generate images, which seem to Goal: Differentiate between fake and real Goal: be realistic images 23

  22. Generative Adversarial Network Real labeled images Real images „yes“ A B . . Is D Discriminator (D) . correct? 8 9 F Generator (G) „D classified the generated image as 10% real“ 24

  23. Mathematical formulation Ob Objective fu function Discriminator calculates likelihood [0,1] for an image being real Discriminator output Discriminator output for real images for fake images Tr Training (al alternat nating ng) Maximizing discriminator loss Minimizing generator loss 25 Goodfellow et al. (2014), Generative Adversarial Networks

  24. Example of generated images Training images: Generated images during learning process: 26

  25. Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 27

  26. Semi-supervised Learning Makes use of • unlabeled data Unsupervised Supervised Learning Learning Combines supervised • and unsupervised learning Semi-supervised Learning 28

  27. Semi-supervised GAN for Character Detection Real labeled images Real unlabeled Discriminator images Generator 29

  28. Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 30

  29. Character Detector (2 classes) Pr Pretrai aining of of D DCNN 100,00% Manually generated images with CAPTCHA methods 90,00% Accuracy 80,00% „Character“ „No character“ 70,00% 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 Size of labeled training set Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained 31

  30. Character Detector (2 classes) Supervi Sup vised G GAN 100,00% C C 90,00% Accuracy Real labeled 80,00% images C C Discriminator F 70,00% Generator 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 F Size of labeled training set Bildschirmfoto 2018-04-24 um Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained Supervised GAN 17.48.20 32

  31. Character Detector (2 classes) Semi-su Se supervise sed GA GAN 100,00% C C 90,00% Real labeled Accuracy images 80,00% C C Discriminator F Real unlabeled 70,00% images Generator 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 Size of labeled training set F Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained Supervised GAN Semi-supervised GAN 33

  32. Character Recognizer (36 classes) Character Recogniz izer Character Detector Ch 100,00% 100,00% 90,00% 90,00% 80,00% Accuracy 80,00% 70,00% Accuracy 60,00% 70,00% 50,00% 60,00% 0 0 0 0 0 0 0 0 0 0 0 2 5 0 0 0 0 0 0 0 0 0 1 2 4 7 0 0 0 0 0 40,00% 1 5 5 0 2 1 3 4 Size of labeled training set 30,00% DCNN DCNN pretrained Supervised GAN 20,00% 10,00% 0,00% 36 72 108 200 300 400 600 800 1000 5000 8000 Size of labeled training set Bildschirmfoto 2018-04-24 um 17.48.20 34

  33. End-to-End Text Spotting Pipeline 85 images Region of Interest Extractor Sliding window Character Detector (2 classes) Non Maximum Suppression Chararacter Recognizer (36 classes) 1. . 2. . Accuracy = 99.94% . 85. 35

  34. Google Cloud Vision API vs. Our Approach . . . 85 images Region of Interest Extractor 85 images of VINs Sliding window Character Detector (2 classes) Google Cloud Vision API Non Maximum Suppression Chararacter Recognizer (36 classes) Levenshtein distance: ∅ Levenshtein distance = 4.49 Classification Label ∅ Levenshtein distance = 0.011 AYZ33 XYZ321 = 3 36

  35. Key Learnings Custom solutions can tremendously outperform • off-the-shelve software in a specific use-case Semi-supervised GANs can be successfully • applied in use-cases with little data With simple data augmentation techniques • having only little data might be enough 37

Recommend


More recommend