handwritten chinese text recognition
play

Handwritten Chinese Text Recognition Wenchao Wang, Jun Du and Zi-Rui - PowerPoint PPT Presentation

Parsimonious HMMs for Offline Handwritten Chinese Text Recognition Wenchao Wang, Jun Du and Zi-Rui Wang University of Science and Technology of China ICFHR 2018, Niagara Falls, USA, Aug. 5-8, 2018 1 Background Offline handwritten Chinese


  1. Parsimonious HMMs for Offline Handwritten Chinese Text Recognition Wenchao Wang, Jun Du and Zi-Rui Wang University of Science and Technology of China ICFHR 2018, Niagara Falls, USA, Aug. 5-8, 2018 1

  2. Background • Offline handwritten Chinese text recognition (OHCTR) is challenging – No trajectory information in comparison to the online case – Large vocabulary of Chinese characters – Sequential recognition with the potential segmentation problem • Approaches – Oversegmentation approaches – Character oversegmentation /classification – Segmentation-free approaches – GMM-HMM: Gaussian mixture model - hidden Markov model – MDLSTM-RNN: Multidimensional LSTM-RNN + CTC – DNN-HMM: Deep neural network – hidden Markov model 2

  3. Review of HMM Approach for OHCTR • Left-to-right HMM is adopted to represent Chinese character. • The character HMMs are concatenated to model the text line. The sequence of concatenated character HMMs 映 反 得 到 The observation sequence of sliding windows 3

  4. Review of DNN-HMM Approach for OHCTR • The Bayesian framework Character modeling Output distribution DNN to calculate state posterior probability 4

  5. Motivation • High demand of memory and computation from DNN output layer • Model redundancy due to similarities among different characters • Parsimonious HMMs to address these two problems • Decision tree based two-step approach to generate tied-state pool 5-state HMM for 5-state HMM for 5-state HMM for character 冻 character 缴 character 练 ... ... Tied-state Pool 5

  6. Binary Decision Tree for State Tying • The parent set has a distribution O 1 , the total log-likelihood of all P x 1 ( ) observations in on the distribution O 1 =  O1,P1(x) of is: L O ( ) log( P x ( )) P x 1 ( ) 1  1 x O 1 • The child set has a distribution O One question 2 2 ( ) P x , the total log-likelihood of all observations in on the distribution O 2 =  of is : L O ( ) log( P x ( )) P x 2 ( ) 2  2 x O 2 O2,P2(x) O3,P3(x) • The child set has a distribution O 3 P x 3 ( ) , the total log-likelihood of all = observations in on the distribution O O O O 3 1 2 3 =  of is : L O ( ) log( P x ( )) P x 3 ( )  3 3 x O 3 • The total increase in set-conditioned log-likelihood of observations due to + − ( ) ( ) ( ) partitioning is: L O L O L O 2 3 1 6

  7. Step 1: Clustering Characters with Decision Tree Is in 愧 怀 怳 忧 快 忱 恍 恢 悦 惋 惯 ? • All states with the same HMM position are initially grouped Yes No together at the root node. Is in 愧 怳 忱 恢 悦 惋 惯 ? Is in 愉 愤 懈 怖 惝 ? • Each node is then recursively Yes No Yes No partitioned to maximize the Is in 慎 懂 性 恼 惊 ? increase in expected log-likelihood with question set. Yes No Is in 慎 懂 ? Yes No Leaf node • All states in the leaves of the Non-leaf node decision tree are tied together. A tree fragment for tying the first state of HMM 7

  8. Step 2: Bottom-up Re-clustering • In the second step, the clusters Decision in leaf nodes obtained in the Tied-state Tree ... ... Pool Yes No first step is re-clustered by a bottom-up procedure using Yes No Yes No sequential greedy optimization. ... 3. Generate 1 2 i n Tied-state pool Tied-state leaf nodes • The expected log-likelihood 2. If #cluster > N: calculate objf decrease by clustering decrease by combining every clusters, recluster two cluster two clusters is calculated. 1. Calculate the with the minimum objf objf decrease by decrease to a new cluster. clustering each • two leaf nodes, A minimum priority queue is push these to this cluster cluster cluster ... ... queue. maintained to re-cluster the two (i,j) (m,n) (k,l) clusters with minimum Minimum Priority Queue log-likelihood decrease to a new cluster. 8

  9. Training Procedure for Parsimonious HMMs 1. Training conventional GMM-HMM system 2. Calculating the first-order and second-order statistics based on state-level forced-alignment 3. Two-step algorithm: First-step : Building the state-tying tree Second-step : Re-clustering the tied-states based on the first-step 4. Parsimonious GMM-HMMs training based on the tied states 5. Parsimonious DNN-HMMs training based on the tied states 9

  10. Experiments • Training set CASIA-HWDB database including HWDB1.0, HWDB1.1, HWDB2.0-HWDB2.2 • Test set ICDAR-2013 competition set. • Vocabulary: 3980 character classes • GMM-HMM system – Each character modeled by a left-to-right HMM with 40-component GMM – Gradient-based features followed by PCA to obtain a 50-dimensional vector • DNN-HMM system – 350-2048-2048-2048-2048-2048-2048-3980*N • DNN-PHMM system – 350-2048-2048-2048-2048-2048-2048-M 10

  11. HMM vs. PHMM • Performance saturation with the increase of states for each character • PHMM outperforming HMM with the same setting of tied-state number • Parsimoniousness of the best PHMM compared with the best HMM • Demonstrating the reasonability of the proposed state tying algorithm 11

  12. HMM vs. PHMM • Much more compact by setting the number of tied-states per character < 1 • DNN-PHMM (Ns=0.5, 9.52%) outperforming DNN-HMM (Ns=1, 11.09%) 12

  13. Memory and Computation Costs DNN-PHMM using (1024, 4) setting achieved a comparable CER with DNN- HMM using (2048, 6) setting, 75% of model size and 72% of run-time latency were reduced in DNN-PHMM compared with DNN-HMM. 13

  14. State Tying Result Analysis Tied Radical Similar characters structure part 喷 喻 嗅 嗡 吃 咆 哦 哨 嘈 嘲 噬 嚼 口 Left-right 客 害 容 密 寇 蜜 穷 穿 突 窃 窍 窑 宀 Top-bottom 口 圃 圆 囚 囤 困 围 固 Surround 巨 匝 匠 匡 匣 匪 匹 医 匿 臣 匚 Left-surround 诞 巡 边 逊 辽 达 谜 迁 迂 过 近 这 辶 Bottom-left-surround 澜 阐 阑 鬲 闸 闻 闽 润 门 Top-surround 串 吊 甲 牢 帛 早 平 | Cross 氛 氢 氦 氨 气 Top-right-surround The Chinese characters with the same or similar radicals were easily tied using the proposed algorithm. This is the reason that the proposed DNN-PHMM with quite compact design can still maintain high recognition performance. 14

  15. Thanks! 15

Recommend


More recommend