D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R - PowerPoint PPT Presentation

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R ECOGNITION Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, Department Engineering Science, University of Oxford, UK 1

T EXT R ECOGNITION Localized text image as input, character string as output DISTRIBUTED COSTA DENIM FOCAL

T EXT R ECOGNITION State of the art — constrained text recognition � word classification [Jaderberg, NIPS DLW 2014] � static ngram and word language model [Bissacco, ICCV 2013] APARTMENTS

T EXT R ECOGNITION State of the art — constrained text recognition � word classification [Jaderberg, NIPS DLW 2014] � static ngram and word language model [Bissacco, ICCV 2013] Random string ? New, unmodeled word ?

T EXT R ECOGNITION Unconstrained text recognition � e.g. for house numbers [Goodfellow, ICLR 2014] business names, phone numbers, emails, etc Random string RGQGAN323 New, unmodeled word TWERK

O VERVIEW • Two models for text recognition [Jaderberg, NIPS DLW 2014] ‣ Character Sequence Model ‣ Bag-of-N-grams Model � • Joint formulation ‣ CRF to construct graph ‣ Structured output loss ‣ Use back-propagation for joint optimization � • Experiments ‣ Generalize to perform zero-shot recognition ‣ When constrained recover performance �

C HARACTER S EQUENCE M ODEL Deep CNN to encode image. Per-character decoder. 1 ⨉ 1 ⨉ 4096 8 ⨉ 25 ⨉ 512 4 ⨉ 13 ⨉ 512 32 ⨉ 100 ⨉ 1 8 ⨉ 25 ⨉ 256 32 ⨉ 100 ⨉ 64 16 ⨉ 50 ⨉ 128 1 ⨉ 1 ⨉ 4096 x 5 convolutional layers, 2 FC layers, ReLU, max-pooling 23 output classifiers for 37 classes (0-9,a-z,null) � Fixed 32x100 input size — distorts aspect ratio

C HARACTER S EQUENCE M ODEL Deep CNN to encode image. Per-character decoder. 0 e z Ø char 1 P ( c 1 | Φ ( x )) ⋮ ⋮ 32 ⨉ 100 ⨉ 1 ⋮ s CHAR CNN char 5 ⋮ ⋮ x char 6 ⋮ ⋮ ⋮ char 23 P ( c 23 | Φ ( x )) ⋮ ⋮ 1 ⨉ 1 ⨉ 37

B AG - OF -N- GRAMS M ODEL Represent string by the character N-grams contained within the string s � p � i � 1-grams r � e � sp � pi � ir � spires 2-grams re � es � spi � pir � 3-grams ire � res � spir � pire � 4-grams ires

B AG - OF -N- GRAMS M ODEL Deep CNN to encode image. N-grams detection vector output. Limited (10k) set of modeled N-grams. N-gram detection vector 1 ⨉ 1 ⨉ 10000 a b 1 ⨉ 1 ⨉ 4096 ⋮ 8 ⨉ 25 ⨉ 512 ak 4 ⨉ 13 ⨉ 512 32 ⨉ 100 ⨉ 1 8 ⨉ 25 ⨉ 256 32 ⨉ 100 ⨉ 64 16 ⨉ 50 ⨉ 128 ke ra aba 1 ⨉ 1 ⨉ 4096 ⋮ rake raze

J OINT M ODEL Can we combine these two representations? 0 r z Ø char 1 ⋮ ⋮ 32 ⨉ 100 ⨉ 1 ⋮ e CHAR CNN char 4 ⋮ ⋮ char 5 ⋮ ⋮ ⋮ char 23 ⋮ ⋮ 1 ⨉ 1 ⨉ 37 1 ⨉ 1 ⨉ 10000 a b 32 ⨉ 100 ⨉ 1 ⋮ ak NGRAM ke CNN ra aba ⋮ rake raze

J OINT M ODEL CHAR f ( x ) CNN a e k q r

J OINT M ODEL maximum number of chars CHAR f ( x ) CNN a e k q r NGRAM g ( x ) CNN

J OINT M ODEL w ∗ = arg max S ( w, x ) CHAR f ( x ) w CNN beam search a e k q r NGRAM g ( x ) CNN

S TRUCTURED O UTPUT L OSS Score of ground-truth word should be greater than or equal to the highest scoring incorrect word + margin. � where Enforcing as soft constraint leads to a hinge loss

S TRUCTURED O UTPUT L OSS

E XPERIMENTS

D ATASETS All models trained purely on synthetic data � [Jaderberg, NIPS DLW 2014] Font rendering Border/shadow & color Composition Projective distortion Natural image blending Realistic enough to transfer to test on real-world images

D ATASETS Synth90k � Lexicon of 90k words. 9 million images, training + test splits Download from http://www.robots.ox.ac.uk/~vgg/data/text/

D ATASETS ICDAR 2003, 2013 � Street View Text IIIT 5k-word

T RAINING Pre-train CHAR and NGRAM model independently. � Use them to initialize joint model and continue jointly training.

E XPERIMENTS - J OINT I MPROVEMENT Train Data Test Data CHAR JOINT Synth90k 87.3 91.0 joint model IC03 85.9 89.6 outperforms character Synth90k sequence model 71.7 SVT 68.0 alone 81.8 IC13 79.5 CHAR: grahaws � JOINT: grahams � GT: grahams CHAR: mediaal � JOINT: medical � GT: medical CHAR: chocoma_ � JOINT: chocomel � GT: chocomel CHAR: iustralia � JOINT: australia � GT: australia

J OINT M ODEL C ORRECTIONS edge down-weighted in graph edges up-weighted in graph

E XPERIMENTS - Z ERO - SHOT R ECOGNITION Train Data Test Data CHAR JOINT Synth90k 87.3 91.0 - Synth72k-90k 87.3 large difference for CHAR model when - Synth45k-90k 87.3 Synth90k not trained on test IC03 85.9 89.6 words SVT 68.0 71.7 joint model recovers IC13 79.5 81.8 performance 89.7 Synth1-72k Synth72k-90k 82.4 Synth1-45k Synth45k-90k 80.3 89.1

E XPERIMENTS - C OMPARISON No Lexicon IC03- IC03 SVT IC13 Full Model Type Model Unconstrained Baseline (ABBYY) - - - 55.0 Wang, ICCV ‘11 - - - 62.0 Bissacco, ICCV ‘13 - 78.0 87.6 Yao, CVPR ‘14 - - - 80.3 Language Constrained Jaderberg, ECCV ‘14 - - - 91.5 Gordo, arXiv ‘14 - - - Jaderberg, NIPSDLW ‘14 98.6 80.7 90.8 98.6 CHAR 85.9 68.0 79.5 96.7 Unconstrained JOINT 89.6 71.7 81.8 97.0

E XPERIMENTS - C OMPARISON No Lexicon Fixed Lexicon IC03- SVT-50 IIIT5k IIIT5k- IC03 SVT IC13 Full -50 1k Model Type Model Unconstrained Baseline (ABBYY) - - - 55.0 35.0 24.3 - Wang, ICCV ‘11 - - - 62.0 57.0 - - Bissacco, ICCV ‘13 - 78.0 87.6 - 90.4 - - Yao, CVPR ‘14 - - - 80.3 75.9 80.2 69.3 Language Constrained Jaderberg, ECCV ‘14 - - - 91.5 86.1 - - Gordo, arXiv ‘14 - - - - 90.7 93.3 86.6 Jaderberg, NIPSDLW ‘14 98.6 80.7 90.8 98.6 95.4 97.1 92.7 CHAR 85.9 68.0 79.5 96.7 93.5 95.0 89.3 Unconstrained JOINT 89.6 71.7 81.8 97.0 93.2 95.5 89.6

S UMMARY • Two models for text recognition � • Joint formulation ‣ Structured output loss ‣ Use back-propagation for joint optimization � • Experiments ‣ Joint model improves accuracy on language-based data. ‣ Degrades elegantly when not from language (N- gram model doesn’t contribute much) ‣ Set benchmark for unconstrained accuracy, competes with purely constrained models.

jaderberg@google.com

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R - PowerPoint PPT Presentation

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R ECOGNITION Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, Department Engineering Science, University of Oxford, UK 1 T EXT R ECOGNITION

O PEN L EARNING S EMINAR P ROVISIONING @ U NISA P RINCIPLES OF O PEN L EARNING O PEN L EARNING P

ENGIE Energa Per September 2016 Highlights Q316: EEP Total Installed Capacity reached 2,638

L EARNING COGNITIVE TASKS ( CURRICULUM ): N OT MY FIRST CHAIR L EARNING ABOUT OBJECTS

C OHERENCE M AKING AND D EEP L EARNING S TRATEGIES FOR SYSTEM CHANGE THAT BENEFIT ALL

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016

Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea earning Fram rameworks GTC

D EEP L EARNING WITH 3D D ATA Fisher Yu Princeton University CVPR2016: 3D Deep Learning with

Wh Whis iskey y Mou ounta ntain in Bi Bighorn orn Sheep eep Draft Plan Presentation and

D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic generative models that are

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Virtual Memory 1 L earning to Play Well With Others (Physical) Memory malloc(0x20000) 0x10000

H OW C AN W E D ESIGN AN A LGORITHM ? In all the above problems one can think of improving

S TRUCTURED B ENEFIT -R ISK A SSESSMENT : A REVIEW OF KEY PUBLICATIONS AND INITIATIVES Alexander

I NPUT -O UTPUT L IFE C YCLE A NALYSIS OF THE E CO M UG By: Marley McVey W HAT IS A LIFE CYCLE

Nonparametric Frontier Analysis using Stata UTPUT X O Oleg Badunenko Distance =

A S TRUCTURED A PPROACH . . . T UTORIAL , P ART II F ROM P ETRI N ETS TO D IFFERENTIAL E QUATIONS

PURSUING A REVIVAL IN GOLD Corporate Presentation 2 nd July 2019 TSX-V: RVG | OTCQB: RVLGF 1

Investor Presentation NYSE American: OGEN Safe Harbor Statement Certain statements made in this

From New Mechanisms to New Standards of Care Corporate Presentation August 2019 Forward-Looking

I N V E S TO R P R E S E N TAT I O N J u l y 2 0 1 9 HORIZONTALLY DIVERSIFIED INTEGRATED

Technologically Enhanced Naturally Occurring Radioactive M aterials (TENORM ) New Requirements at

Compliance & Ethics Professional Vol. 7 / No. 6 12 / 2010 A PUBLICATION OF THE SOCIETY OF

Digital Savings & Payments Evidence from Recent RCTs Rebecca Rouse Director, Financial

Presented by: Eric Paul & Brad Rogers 1 canntrust.ca A focused medical approach to meet

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R - PowerPoint PPT Presentation

D EEP S TRUCTURED O UTPUT L EARNING FOR U NCONSTRAINED T EXT R ECOGNITION Max Jaderberg, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Visual Geometry Group, Department Engineering Science, University of Oxford, UK 1 T EXT R ECOGNITION

O PEN L EARNING S EMINAR P ROVISIONING @ U NISA P RINCIPLES OF O PEN L EARNING O PEN L EARNING P

ENGIE Energa Per September 2016 Highlights Q316: EEP Total Installed Capacity reached 2,638

L EARNING COGNITIVE TASKS ( CURRICULUM ): N OT MY FIRST CHAIR L EARNING ABOUT OBJECTS

C OHERENCE M AKING AND D EEP L EARNING S TRATEGIES FOR SYSTEM CHANGE THAT BENEFIT ALL

GPU-B ASED D EEP L EARNING IN C LOUD AND E MBEDDED S YSTEMS F REDERICK S OO , CTO April 4, 2016

Th The Fron ronti tier of f Def efine-by by-Run Dee eep Lea earning Fram rameworks GTC

D EEP L EARNING WITH 3D D ATA Fisher Yu Princeton University CVPR2016: 3D Deep Learning with

Wh Whis iskey y Mou ounta ntain in Bi Bighorn orn Sheep eep Draft Plan Presentation and

D EEP B ELIEF N ETWORKS (DBN S ) Deep belief nets are probabilistic generative models that are

M ACHINE L EARNING ON N EUROIMAGING D ATA L ECTURE 2: I NTRODUCTION TO M ACHINE L EARNING Ilya

Virtual Memory 1 L earning to Play Well With Others (Physical) Memory malloc(0x20000) 0x10000

H OW C AN W E D ESIGN AN A LGORITHM ? In all the above problems one can think of improving

S TRUCTURED B ENEFIT -R ISK A SSESSMENT : A REVIEW OF KEY PUBLICATIONS AND INITIATIVES Alexander

I NPUT -O UTPUT L IFE C YCLE A NALYSIS OF THE E CO M UG By: Marley McVey W HAT IS A LIFE CYCLE

Nonparametric Frontier Analysis using Stata UTPUT X O Oleg Badunenko Distance =

A S TRUCTURED A PPROACH . . . T UTORIAL , P ART II F ROM P ETRI N ETS TO D IFFERENTIAL E QUATIONS

PURSUING A REVIVAL IN GOLD Corporate Presentation 2 nd July 2019 TSX-V: RVG | OTCQB: RVLGF 1

Investor Presentation NYSE American: OGEN Safe Harbor Statement Certain statements made in this

From New Mechanisms to New Standards of Care Corporate Presentation August 2019 Forward-Looking

I N V E S TO R P R E S E N TAT I O N J u l y 2 0 1 9 HORIZONTALLY DIVERSIFIED INTEGRATED

Technologically Enhanced Naturally Occurring Radioactive M aterials (TENORM ) New Requirements at

Compliance &amp; Ethics Professional Vol. 7 / No. 6 12 / 2010 A PUBLICATION OF THE SOCIETY OF

Digital Savings &amp; Payments Evidence from Recent RCTs Rebecca Rouse Director, Financial

Presented by: Eric Paul &amp; Brad Rogers 1 canntrust.ca A focused medical approach to meet

Compliance & Ethics Professional Vol. 7 / No. 6 12 / 2010 A PUBLICATION OF THE SOCIETY OF

Digital Savings & Payments Evidence from Recent RCTs Rebecca Rouse Director, Financial

Presented by: Eric Paul & Brad Rogers 1 canntrust.ca A focused medical approach to meet