Fully Convolutional Networks for Handwriting Recognition Felipe Petroski Such*, Dheeraj Peri*, Frank Brockler†, Paul Hutkowski†, Raymond Ptucha* *Rochester Institute of Technology, †Kodak Alaris, 1 Such et al. ICFHR‘18 Background • Offline handwriting recognition continues to be a difficult process due to the virtually infinite ways the same information can be written. • Convolutional Neural Networks (CNNs) and have been applied to handwriting recognition with good success. • Recurrent Neural Networks (RNNs) are useful for arbitrary length I am truly touched by your kind contribution to sequences and Connectionist my birthday presents & grateful for your good Temporal Classification (CTC) are wishes. Winston Churchill good as a post correction step. Note: Some believe the above letter is a forgery. 3 Such et al. ICFHR‘18 1
Workflow- Word Extraction Document Block Segmentation Segmentation SegNet or similar labels Modified XY Tree or similar each pixel by type- can grow suggests rectilinear splits. to orthogonal boundaries. Use both to define paragraphs, sentences and word blocks. 4 Such et al. ICFHR‘18 Workflow- Word Recognition • Preprocessing – Fix skewing, rotation, contrast • Prediction – CNNs, HMM, LSTMs used together • Post-processing – Train & Test: CTC f o r – Test: Language Model 5 Such et al. ICFHR‘18 2
Proposed Method • Character classification without the need for: – Preprocessing- no deskewing – Predefined lexicon of words- can work on surnames, phone numbers, and street addresses – Post processing- No RNN or CTC needed • Utilizes Fully Convolutional Networks (FCNs) to translate arbitrary sequence length. – FCNs are faster to train than RNNs and more robust – CTC can still be used, but we found them hard to converge • Single architecture works on arbitrary words as well as words from a lexicon 6 Such et al. ICFHR‘18 High Level Symbol Vocabulary Language Length CNN CNN CNN Model CNN Predicts CNN Predicts FCN Predicts (optional step) word label for the number of 2 N +1 symbols, When block is common words symbols, then where each known to come such as ‘ his ’, resample block symbol is from a lexicon to 32 ☓ 16 N , ‘ her ’, ‘ the ’. If separated by a of words, use confidence > g , where N is the blank space. vocabulary then done! number of matching by symbols. minimizing character error rate. 7 Such et al. ICFHR‘18 3
Vocabulary and Length CNNs Input pixels conv1a conv1b conv1c (64) (64) (64) 32 32 32 32 3×3 3×3 3×3 64 64 64 128 128 128 128 pool 1 conv2a conv3a (128) conv2b (256) pool 2 8 256 8 16 16 3×3 16 128 (128) 3×3 128 128 64 32 32 64 3×3 64 64 conv3b (256) 3×3 Conv4 FC For vocabulary, V=~1000 (512) pool 3 (V) For length, V=32 (but can be any value or regression) 8 256 4 256 4×16 32 16 512 V C(64,3,3)-C(64,3,3)-C(64,3,3)-P(2)-C(128,3,3)-C(128,3,3)-C(256,3,3)-P(2)- C(256,3,3)-C(512,3,3)-C(512,3,3)-P(2)-C(256,4,16)-FC(V)-SoftMax where C(D,H,W) stands for convolution with the dimensions of the filter as H ☓ W and the depth D. Each convolutional layer is followed by a batch norm and ReLU layer. P(2) represents a 2 ☓ 2 pooling layer with stride 2. 8 Such et al. ICFHR‘18 High Level Symbol Vocabulary Language Length CNN CNN CNN Model CNN Predicts CNN Predicts FCN Predicts (optional step) word label for the number of 2 N +1 symbols, When block is common words symbols, then where each known to come such as ‘ his ’, resample block symbol is from a lexicon to 32 ☓ 16 N , ‘ her ’, ‘ the ’. If separated by a of words, use confidence > g , where N is the blank space. vocabulary then done! number of matching by symbols. minimizing character error rate. 9 Such et al. ICFHR‘18 4
Symbol FCN Context path Conv Conv Conv 3x3 3x3 3x3 3 1 8 128 4 25 2 64 pool 6 pool 16 6 64 pool 3 128 2 FC Relu x2 Tile FC 1024 Add (N s ) 1 N s 102 3 ReLU 2N+1 ReLU 4 2N+ 2N+ Predictions 1 1 3 102 4 2N+ Conv 1 4x4 1x2 pad Conv Conv Conv 3x3 3x3 3x3 x2 x2 x3 3 1 8 4 512 128 256 2 6 pool pool 16/2 pool 64/8 32/4 128/16 N N N N Symbol detail path 11 Such et al. ICFHR‘18 Symbol FCN (1024) (N s ) Conv FullyConv 4x4x512 3x1x1024 1x2 pad softmax 1x2 pad 4 N s 3 1 512 1024 2N+1 2N 2N+1 2N+1 Predictions N Input 2N+1 Predicted Symbols Symbols • Vertical pad gives N=1 N=3 forgiveness for up/down- can think as three estimates for each prediction. Activation maps 2N wide • Horizontal pad gives Pad of 2 on left/right 2N+1 outputs. Conv filter of width 4 12 Such et al. ICFHR‘18 5
Symbol FCN (1024) (N s ) Conv FullyConv 4x4x512 3x1x1024 1x2 pad softmax 1x2 pad 4 N s 3 1 512 1024 2N+1 2N 2N+1 2N+1 Predictions N Input 2N+1 Predicted Symbols Symbols • Vertical pad gives N=1 N=3 forgiveness for up/down- can think as three estimates for each prediction. N=9 N=4 • Horizontal pad gives 2N+1 outputs. 13 Such et al. ICFHR‘18 Symbol FCN (1024) (N s ) Conv FullyConv 4x4x512 3x1x1024 1x2 pad softmax 1x2 pad 4 N s 3 1 512 1024 2N+1 2N 2N+1 2N+1 Predictions • Vertical pad gives • Softmax over N s • Each of 2N+1 forgiveness for symbols. predictions are a up/down- can think linear combination as three estimates of 3x1024 for each prediction. activation map. • Horizontal pad gives 2N+1 outputs. 14 Such et al. ICFHR‘18 6
Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 - - - - - i 2 - - - - - m 3 - - - - - e 4 - - - - - 15 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 ? - - - - i 2 - - - - - m 3 - - - - - e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 16 Such et al. ICFHR‘18 7
Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 ? - - - - i 2 - - - - - m 3 - - - - - e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 17 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 - - - - Match! Pass i 2 - - - - - along previous m 3 - - - - - error e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 18 Such et al. ICFHR‘18 8
Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 - - - - Miss! +1 To insert i i 2 1 - - - - m 3 - - - - - e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 19 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 - - - - Miss! +1 To insert i 2 1 - - - - m, then e m 3 2 - - - - e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 20 Such et al. ICFHR‘18 9
Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to delete i 2 1 - - - - y m 3 2 - - - - e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 21 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to replace i 2 1 1 - - - y with i m 3 2 - - - - e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 22 Such et al. ICFHR‘18 10
Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to replace i 2 1 1 - - - y with m m 3 2 2 - - - or +1 to insert m e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 23 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to replace i 2 1 1 - - - y with e m 3 2 2 - - - or +1 to insert e e 4 3 3 - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 24 Such et al. ICFHR‘18 11
Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 2 - - Miss, +1 to delete i 2 1 1 - - - m m 3 2 2 - - - e 4 3 3 - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 25 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 2 - - Miss, +1 to replace i 2 1 1 2 - - m with i m 3 2 2 - - - or +1 to delete y e 4 3 3 - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 26 Such et al. ICFHR‘18 12
Recommend
More recommend