deep transfer mapping for unsupervised writer adaptation
play

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming - PowerPoint PPT Presentation

Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 , Fei Yin 1,2 , Jun Sun 4 , Cheng-Lin Liu 1,2,3 1 NLPR, Institute of Automation, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3


  1. Deep Transfer Mapping for Unsupervised Writer Adaptation Hong-Ming Yang 1,2 , Xu-Yao Zhang 1,2 , Fei Yin 1,2 , Jun Sun 4 , Cheng-Lin Liu 1,2,3 1 NLPR, Institute of Automation, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3 CAS Center for Excellence in Brain Science and Intelligence Technology 4 Fujitsu Research & Development Center Aug. 8, 2018

  2. Outline Introduction Style Transfer Mapping Motivation and the Proposed Method Experiments and Analysis Conclusions 2/18

  3. Introduction  A main challenge for handwriting recognition: The large variability of distributions across training and different test data – Different writing styles of different writers – Different writing tools (e.g. different pens or electronic writing devices) – Different writing environments (e.g. normal or emergency situations) – ………… Written characters of two writers. 3/18

  4. Introduction • Domain adaption: a form of transfer learning Adapt the base classifier to each domain in the test dataset Writer 1 Training Methods Adaptation Training Base Writer 2 Datasets Classifier ⁞ Writer N • Recent methods: mainly based on deep learning – Fine tuning with target domain data – Learning domain invariant representations (features) – Project source or target domain data to align the distribution 4

  5. Style Transfer Mapping  Style transfer mapping (STM) Main idea: project the target domain (test) data to balance the data distribution 𝑞 𝑦 𝑡 ≠ 𝑞 𝑦 𝑢 𝑦 𝑢 = 𝐵 𝑢 𝑦 𝑢 + 𝑐 𝑢 𝑞 𝑦 𝑡 ≈ 𝑞 𝑦 𝑢 Learning classifier on 𝑦 𝑡 and apply 𝑦 𝑢 to the base classifier  Learning of the projection ( 𝑩 𝒖 and 𝒄 𝒖 ) 𝑜 2 + 𝛾 2 + 𝛿 𝑐 2 2 𝑛𝑗𝑜 𝐵∈𝑆 𝑒×𝑒 ,𝑐∈𝑆 𝑒 𝑔 𝑗 𝐵𝑡 𝑗 + 𝑐 − 𝑢 𝑗 𝐵 − 𝐽 𝐺 2 𝑗=1 Source points 𝒕 𝒋 : features in the target domain, i.e., 𝑦 𝑢 Target points 𝒖 𝒋 : prototype (LVQ) or mean (MQDF) for class 𝑧 𝑗 ( 𝑧 𝑗 is the label of sample 𝑡 𝑗 ) [Xu-Yao. Zhang et al., writer adaptation with style transfer mapping. TPAMI’13] 5/18

  6. Style Transfer Mapping Solution: a convex quadratic programming problem, has a closed-form solution 𝐵 = 𝑅𝑄 −1 , 𝑐 = 1 𝑢 − 𝐵𝑡 𝑔 𝑜 𝑜 𝑜 𝑜 𝑜 − 1 − 1 𝑢 𝑡 Τ + 𝛾𝐽 𝑡 𝑡 Τ + 𝛾𝐽 = 𝑔 Τ Τ 𝑔 + 𝛿 𝑅 = 𝑔 𝑗 𝑢 𝑗 𝑡 𝑗 𝑄 = 𝑔 𝑗 𝑡 𝑗 𝑡 𝑗 𝑢 = 𝑔 𝑗 𝑢 𝑗 𝑡 = 𝑔 𝑗 𝑡 𝑗 𝑗 𝑔 𝑔 𝑗=1 𝑗=1 𝑗=1 𝑗=1 𝑗=1  Dealing with unsupervised adaptation – Using the pseudo labels, predicted by the base classifier – Iteration method: base classifier  pseudo label  adaptation  better pseudo label  adaptation  ……  Extend to convolutional neural networks (CNNs) Main idea: perform adaptation on the deep features 𝑔 𝑦 : 𝐷𝑂𝑂 𝑔𝑓𝑏𝑢𝑣𝑠𝑓 𝑓𝑦𝑢𝑠𝑏𝑑𝑢𝑝𝑠 𝑦 𝑡 = 𝑔 𝑦 𝑡 , 𝑦 𝑢 = 𝑔 𝑦 𝑢 [Xu-Yao. Zhang et al., writer adaptation with style transfer mapping. TPAMI’13] 6/18 [Xu-Yao. Zhang et al., online and offline handwritten Chinese character recognition: a comprehensive study and new benchmark. PR’17]

  7. Motivations & Methods  Traditional adaptation methods with CNN – Consider only the fully connected layers – Perform adaptation only on one layer  Motivations – Adaptation on both fully connected layers and convolutional layers – Perform adaptation on multiple (or all) layers of the base CNN  Adaptation method for fully connected layers STM based on the deep features of the layer (unsupervised adaptation)  Adaptation methods for convolutional layers – Use a linear transformation to project the target domain data for aligning the data distributions – Propose four variations of linear transformation, which are based on different assumptions of the space relation in the feature maps 7/18

  8. Motivations & Methods  Fully associate adaptation (FAA): – Output of a convolutional layer for an input 𝑦 𝑗 𝑑=𝐷,𝑘=𝐼,𝑙=𝑋 𝑝 𝑗 = 𝑒 𝑑𝑘𝑙 𝑑=1,𝑘=1,𝑙=1 𝑑, 𝑘, 𝑙: index of the feature maps, rows, and columns in each feature map – Assumption: all positions of 𝑑, 𝑘, 𝑙 are related to each other – Method: expand 𝑝 𝑗 to a long vector 𝑤 𝑗 with dimension 𝐷𝐼𝑋 , and learn a transformation 𝐵 ∈ 𝑆 𝐷𝐼𝑋×𝐷𝐼𝑋 , 𝑐 ∈ 𝑆 𝐷𝐼𝑋 by STM ′ = 𝐵𝑤 𝑗 + 𝑐 , 𝑤 𝑗 ′ ′ are related 𝐷𝐼𝑋 𝑘 = – 𝑤 𝑗 𝐵 𝑘𝑙 𝑤 𝑗 𝑙 + 𝑐 , each position 𝑘 in 𝑤 𝑗 𝑘 𝑙=1 to all positions in 𝑤 𝑗 8/18

  9. Motivations & Methods  Partly associate adaptation (PAA): – Assumption: positions within the same feature map are related to each other, but the feature maps are mutually independent – Method: expand each feature map to a vector with dimension 𝐼𝑋 , and learn a transformation 𝐵 𝑑 ∈ 𝑆 𝐼𝑋×𝐼𝑋 , 𝑐 𝑑 ∈ 𝑆 𝐼𝑋 for each feature map 𝑑 separately by STM – Transformation 𝐵 𝑑 , 𝑐 𝑑 ensures the relation of positions within a feature map, learn 𝐵 𝑑 , 𝑐 𝑑 separately ensures the independence between the feature maps Separate STM for each feature map 9/18

  10. Motivations & Methods  Weakly independent adaptation (WIA): – Assumption: all positions 𝑑, 𝑘, 𝑙 in 𝑝 𝑗 are independent to each other – Learn a transformation 𝑏, 𝑐 ∈ 𝑆 for each position 𝑑, 𝑘, 𝑙 separately by STM ′ – 𝑝 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 = 𝑏 𝑝 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 + 𝑐 2 + 𝛾 𝑏 − 1 2 + 𝛿𝑐 2 𝑂 𝑢 – 𝑛𝑗𝑜 𝑏,𝑐∈𝑆 𝑔 𝑗 𝑏 𝑝 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 + 𝑐 − 𝑢 𝑗 𝑑 0 ,𝑘 0 ,𝑙 0 𝑗=1 10/18

  11. Motivations & Methods  Strong independent adaptation (SIA): – Assumption: all positions are independent to each other and the positions within the same feature map share a same linear transformation – Similar to the linear projection in the batch normalization (BN) layer – Learn a transformation 𝑏, 𝑐 ∈ 𝑆 for each feature map separately by STM 𝑂 𝑢 𝐼 𝑋 2 + 𝛾 𝑏 − 1 2 + 𝛿𝑐 2 𝑛𝑗𝑜 𝑏,𝑐∈𝑆 𝑔 𝑗 𝑏 𝑝 𝑗 𝑑 0 ,𝑘,𝑙 + 𝑐 − 𝑢 𝑗 𝑑 0 ,𝑘,𝑙 𝑗=1 𝑘=1 𝑙=1 11/18

  12. Motivations & Methods  Analysis and comparison Adaptation FAA PAA WIA SIA Methods All positions Inner feature All positions All positions independent Assumption related map related independent & parameter sharing Feature Dimension 𝐷𝐼𝑋 𝐼𝑋 1 1 Matrix Size 𝐷𝐼𝑋 × 𝐷𝐼𝑋 𝐼𝑋 × 𝐼𝑋 1 × 1 1 × 1 Transformation 1 𝐷 𝐷𝐼𝑋 𝐷 Number Total Parameters 𝐷𝐼𝑋 𝐷𝐼𝑋 + 1 C 𝐼𝑋 𝐼𝑋 + 1 2𝐷𝐼𝑋 2𝐷 – Complexity & Flexibility: FAA > PAA > WIA > SIA – Computation & Memory efficiency: SIA > WIA > PAA > FAA 12/18

  13. Motivations & Methods  Deep transfer mapping (DTM) Perform adaptation on multiple layers in a deep manner  Algorithm 1. Select a group of layers 𝑀 on which to perform adaptation 2. From bottom to top layers in 𝑀 , perform adaptation on the specific layer with the proposed adaptation methods, but keep the other layers unchanged 3. After adaptation on each layer, insert an linear layer after it and set the weights and bias of the linear layer as the solved 𝐵 and 𝑐  Advantages of DTM – More powerful for flexibly aligning the distributions between the domains – Captures more comprehensive information and minimize the discrepancy of distributions under different abstract levels 13/18

  14. Experiments & Analysis  Datasets #Sample (in 3755- #Writers Dataset Dataset Info class) (domains) Training Set CASIA OLHWDB 1.0-1.2 2,697,673 1020 On-ICDAR2013 Test Set 224,590 60 Competition Adaptation Set Unlabeled samples from each domain (writer) in test set Online handwritten Chinese characters. The samples of one writer stored in one single file, can be viewed as a domain. 14/18

  15. Experiments & Analysis Base classifier: 11 layers 97.55% accuracy on test set  Different adaptation methods for Convolutional layers Four adaption methods on the same convolutional layer #8 15/18

  16. Experiments & Analysis  Adaptation property of different layers in CNN Adaptation method: WIA – From bottom to top layers, the adaptation performance increases – Bottom layers extract general features, which are applicable across different domains, thus the promotion are not obvious after adaptation – Top layers occupy abstract features, which are more domain specific, thus adaptation is helpful for such layers 16/18

  17. Experiments & Analysis  Deep transfer mapping – DTM can further boost the performance of the base classifier – DTM still has some limitations, the promotion is not obvious when adopt overmuch adaptations 17/18

Recommend


More recommend