character recognition reporter zecheng xie
play

Character Recognition Reporter: Zecheng Xie South China University - PowerPoint PPT Presentation

Accelerating and Compressing LSTM based Model for Online Handwritten Chinese Character Recognition Reporter: Zecheng Xie South China University of Technology August 5 th , 2018 Outline Motivation Difficulties Our approach


  1. Accelerating and Compressing LSTM based Model for Online Handwritten Chinese Character Recognition Reporter: Zecheng Xie South China University of Technology August 5 th , 2018

  2. Outline  Motivation  Difficulties  Our approach  Experiments  Conclusion 2

  3. Motivation  Online handwritten Chinese character recognition (HCCR) is widely used in pen input devices and touch screen devices 3

  4. Motivation Our goal: build fast  The difficulties of online HCCR and compact models  Large number of character classes for on-device  Similarity between characters inference  Diversity of writing styles  Deep learning models are powerful but raise other problems  Models are too large  require large footprint and memory  Computational expensive  consume much energy  The advantages of deploying models on mobile devices  Ease server pressure  Better service latency  Can work offline  Privacy protection  … 4

  5. Difficulties of deploying LSTM based online HCCR models on mobile devices  3755 classes  Model tends to be large  Dependences between time steps  Make the inference slow  Nature of RNNs, unlikely to be changed Unroll of RNN [1] [1] http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 5

  6. Our approach  The proposed framework Prune Cluster Reconstruct The baseline baseline redundant remaining model with SVD connections connections 6

  7. Our approach  Data preprocessing and augmentation  Randomly remove 30% of the points in each character  Perform coordinate normalization  Remove redundant points using method proposed in [1]  Point that is too close to the point before it  Middle point that nearly stands in line with the two points before and after it  Data transform & feature extraction[1] 𝑦 𝑗 , 𝑧 𝑗 , 𝑡 𝑗 , 𝑗 = 1, 2, 3, …  𝑦 𝑗 , 𝑧 𝑗 , ∆𝑦 𝑗 , ∆𝑧 𝑗 , 𝑡 𝑗 = 𝑡 𝑗+1 , (𝑡 𝑗 ≠ 𝑡 𝑗+1 ) , 𝑗 = 1, 2, 3, …  [1] X.-Y. Zhang et al., “Drawing and recognizing Chinese characters with recurrent neural network”, TPAMI, 2017 7

  8. Our approach  Data preprocessing and augmentation [1] X.-Y. Zhang et al., “Drawing and recognizing Chinese characters with recurrent neural network”, TPAMI, 2017 8

  9. Our approach  Baseline model architecture  Input-100LSTM-512LSTM-512FC-3755FC-Output t=1 t=T 512 FC 3755 FC 100 LSTM input 512 LSTM 9

  10. Our approach  Reconstruct network with singular value decomposition (SVD) 𝑗 𝑢 = 𝜏(𝑋 𝑗𝑗 𝑦 𝑢 + 𝑋 ℎ𝑗 ℎ 𝑢−1 + 𝑐 𝑗 ) 𝑔 𝑢 = 𝜏 𝑋 𝑗𝑔 𝑦 𝑢 + 𝑋 ℎ𝑔 ℎ 𝑢−1 + 𝑐 𝑔 𝑕 𝑢 = tanh 𝑋 𝑗𝑕 𝑦 𝑢 + 𝑋 ℎ𝑕 ℎ 𝑢−1 + 𝑐 𝑕 Main 𝑝 𝑢 = 𝜏(𝑋 𝑗𝑝 𝑦 𝑢 + 𝑋 ℎ𝑝 ℎ 𝑢−1 + 𝑐 𝑝 ) computation 𝑑 𝑢 = 𝑔 𝑢 ∗ 𝑑 𝑢−1 + 𝑗 𝑢 ∗ 𝑕 𝑢 ℎ 𝑢 = 𝑝 𝑢 ∗ tanh(𝑑 𝑢 ) 𝑐 𝑗 𝑋 𝑋 𝑗 𝑢 𝜏 𝑗𝑗 ℎ𝑗 𝑋 𝑋 𝑐 𝑔 𝜏 𝑔 𝑗𝑔 ℎ𝑔 𝑢 = * 𝑦 𝑢 + ℎ 𝑢−1 + tanh 𝑕 𝑢 𝑋 𝑋 𝑐 𝑕 𝑗𝑕 ℎ𝑕 𝑝 𝑢 𝜏 𝑋 𝑋 𝑐 𝑝 𝑗𝑝 ℎ𝑝 10 10

  11. Our approach  Reconstruct network with singular value decomposition (SVD) 𝑐 𝑗 𝑋 𝑋 𝑗 𝑢 𝜏 𝑗𝑗 ℎ𝑗 𝑋 𝑋 𝑐 𝑔 𝜏 𝑔 𝑗𝑔 ℎ𝑔 𝑢 = 𝑦 𝑢 + ℎ 𝑢−1 + * 𝑕 𝑢 tanh 𝑋 𝑋 𝑐 𝑕 𝑗𝑕 ℎ𝑕 𝑝 𝑢 𝜏 𝑋 𝑋 𝑐 𝑝 𝑗𝑝 ℎ𝑝 𝑋 𝑗 𝑦 𝑢 𝑋 ℎ ℎ 𝑢−1  Apply SVD to 𝑋 𝑗 and 𝑋 ℎ  𝑋 𝑗 : input connections  𝑋 ℎ : hidden-hidden connections 11 11

  12. Our approach  Efficiency analysis of SVD method  Suppose 𝑋 ∈ ℝ 𝑛×𝑜 , by SVD we have 𝑈 𝑋 𝑛×𝑜 = 𝑉 𝑛×𝑜 Σ 𝑜×𝑜 𝑊 𝑜×𝑜  By reserving proper number of singular values 𝑈 𝑋 𝑛×𝑜 ≈ 𝑉 𝑛×𝑠 Σ 𝑠×𝑠 𝑊 = 𝑉 𝑛×𝑠 𝑂 𝑠×𝑜 𝑜×𝑠  Replace 𝑋 𝑛×𝑜 with 𝑉 𝑛×𝑠 𝑂 𝑠×𝑜  𝑋𝑦 → 𝑉𝑂𝑦 12 12

  13. Our approach  Efficiency analysis of SVD method  For a matrix-vector multiplication 𝑋𝑦 , 𝑋 ∈ ℝ 𝑛×𝑜 , 𝑦 ∈ ℝ 𝑜×1 , the acceleration rate and compression rate with r singular values reserved is given by 𝑛𝑜 𝑆 𝑏 = 𝑆 𝑑 = 𝑛𝑠 + 𝑠𝑜  If 𝑛 = 512, 𝑜 = 128, 𝑠 = 32 , then 𝑆 𝑏 = 𝑆 𝑑 = 3.2 13 13

  14. Our approach  Adaptive drop weight (ADW) [1]  Improvement on “Deep Compression” [2] in which a hard threshold is set  ADW gradually prunes away redundant connections in each layer, which have small absolute values (by sort them during retraining)  After ADW, the network become sparse, K-means based quantization is applied to each layer to gain further compression [1] X. Xiao, L. Jin, et al., “ Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition”, Pattern Recognition, 2017 [2] S. Han,et al., “Deep compression: compressing deep neural network with pruning, trained quantization 14 14 and Huffman coding”, ICLR, 2016

  15. Our approach  The proposed framework - review Prune Cluster Reconstruct The baseline baseline redundant remaining model with SVD connections connections 15 15

  16. Experiments  Training set  CASIA OLHWDB1.0 & OLHWDB1.1  720 writers, 2,693,183 samples, 3755 classes  Test set  ICDAR2013 online competition dataset  60 writers, 224,590 samples, 3755 classes  Data preprocessing and augmentation as mentioned before 16 16

  17. Experiments  Details of the baseline model  Main storage cost: LSTM2, FC1, FC2  Main computation cost: LSTM2 17 17

  18. Experiments  Experimental settings  Consideration of the experimental settings  In our experiments, we found LSTM is more sensitive to input connections than hidden-hidden connections  Most computation latency is introduced by hidden-hidden connections 18 18

  19. Experiments  Experimental results  Intel Core i7-4790, single thread After SVD, model is 10 × smaller, and FLOPs is also reduced by 10 ×  After ADW & quantization, model is 31 × smaller, and FLOPs is further reduced  A minor 0.5% drop of accuracy  19 19

  20. Experiments  Experimental results  Compared with [11], our model is 300 × smaller and 4 × faster on CPU  Compared with [15], our model is 52 × smaller and 109 × faster on CPU [1] W. Yang, L. Jin, et al ., “ Dropsample: A new training method to enhance deep convolutional neural networks for largescale unconstrained handwritten Chinese character recognition”, Pattern Recognition, 2016 20 20 [2] X.-Y. Zhang, et al ., “Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark”, Pattern Recognition, 2017

  21. Conclusion  SVD is efficient for accelerating computation  ADW also works well for LSTMs  By combining SVD and ADW, we can build fast and compact LSTM based model for online HCCR 21 21

  22. Thank you! Lianwen Jin( 金连文 ), Ph.D, Professor eelwjin@scut.edu.cn lianwen.jin@gmail.com Zecheng Xie( 谢泽澄 ), Ph.D, student Yafeng Yang( 杨亚锋 ), Master, student http://www.hcii-lab.net/ 22 22

Recommend


More recommend