dilated convolutional network with iterative optimization
play

Dilated Convolutional Network with Iterative Optimization for - PowerPoint PPT Presentation

Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition Junfu Pu, Wengang Zhou, Houqiang Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, EEIS


  1. Dilated Convolutional Network with Iterative Optimization for Continuous Sign Language Recognition Junfu Pu, Wengang Zhou, Houqiang Li CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, EEIS Department, University of Science and Technology of China pjh@mail.ustc.edu.cn, zhwg@ustc.edu.cn, lihq@ustc.edu.cn July 2018

  2. Outline  Background  Contribution  Proposed Architecture  Iterative Optimization  Experimental Results  Conclusions 2

  3. Outline  Background  Contribution  Proposed Architecture  Iterative Optimization  Experimental Results  Conclusions 3

  4. Background  What is Sign Language? ◼ Communicating language used primarily by deaf people ◼ Use different medium such as hands, face, etc. for communication purpose  Why Sign Language? ◼ > 20 million people with hearing damage ◼ Algorithm applied for human-machine interaction ◼ Social impact: AI techniques improve the life quality for people with disabilities 4

  5. Background Problem in real world Communication Difficulty hearing and language damage Translation Research Topic Text Sign video Results Recognition (translation) System 5

  6. Ƹ Background  Problem Formulation ➢ Continuous SLR ➢ Isolated SLR 𝑈 𝒕 = 𝑡 𝑗 𝑢=1 𝑑 = arg max 𝑞(𝑑 𝑗 |𝑾) 𝑡 𝑗 ∈ 𝒲|𝑗 = 1,2, … , 𝐿} 𝑗 Input 𝑗 = 1,2, … , 𝐿 𝒕 = arg max ො 𝒕∈𝒕 ∗ 𝑞(𝒕|𝑾) MOEGLICH HEUTE NACHT Output Democracy FROST GLATT VORSICHT FLUSS MOEGLICH PLUS ACHT 6

  7. Outline  Background  Contribution  Proposed Architecture  Iterative Optimization  Experimental Results  Conclusions 7

  8. Contribution  Develop a new framework based on 3D residual network and dilated convolutions for continuous sign language recognition  Propose an iterative optimization strategy with Connectionist Temporal Classification (CTC) for our sign language recognition system  Outperform the state-of-the-art methods on RWTH-PHOENIX-Weather dataset 8

  9. Outline  Background  Contribution  Proposed Architecture  Iterative Optimization  Experimental Results  Conclusions 9

  10. Proposed Architecture  Overall Framework ➢ Visual Feature Extractor: 3D-ResNet ➢ Sequence Learning Model: Dilated Conv. Net with CTC (𝑗−1) )) 𝑗−1 𝐖 𝑂 = 𝑤 𝑢 𝑢=1 𝐆 𝑂 = 𝚾 𝚰 𝒘 𝒖 𝑨 = tanh 𝒟 𝑒 ℎ 𝑢 ⊙ 𝜏(𝒟 𝑒 (ℎ 𝑢 𝑈 𝑂 𝑂 𝐘 = 𝑦 𝑢 𝑢=1 𝑢=1 𝑗 = tanh(𝒟 1∗1 (𝑨)) 𝑝 𝑢 𝑗 𝑝 𝑢 = ෍ ෍ 𝑝 𝑢 𝑗 = ℎ 𝑢 𝑗−1 + 𝑝 𝑢 𝑗 ℎ 𝑢 10 𝑏𝑚𝑚−𝑐𝑚𝑝𝑑𝑙𝑡 𝑗

  11. Proposed Architecture  3D ResNet  Dilated Cell 𝑈 𝐘 = 𝑦 𝑢 𝑢=1 𝑗−1 (𝑗−1) )) 𝑨 = tanh 𝒟 𝑒 ℎ 𝑢 ⊙ 𝜏(𝒟 𝑒 (ℎ 𝑢 𝑗 = tanh(𝒟 1∗1 (𝑨)) 𝑝 𝑢 𝐖 𝑂 = 𝑤 𝑢 𝑢=1 𝑗 = ℎ 𝑢 𝑗−1 + 𝑝 𝑢 𝑂 𝑗 ℎ 𝑢 𝑗 𝑝 𝑢 = ෍ ෍ 𝑝 𝑢 𝑏𝑚𝑚−𝑐𝑚𝑝𝑑𝑙𝑡 𝑗 𝐆 𝑂 = 𝚾 𝚰 𝒘 𝒖 𝑂 11 𝑢=1

  12. Outline  Background  Contribution  Proposed Architecture  Iterative Optimization  Experimental Results  Conclusions 12

  13. Iterative Optimization ➢ Step 1: Optimize dilated convolutional network with CTC loss, generate pseudo labels. ℒ CTC = − ln 𝑞(𝒕|𝐖) ℓ 𝑗 = arg max 𝑄 𝑗∗ 𝑘 ➢ Step 2: Fine-tune 3D-ResNet with category loss using pseudo labels. ➢ Step 3: Extract improved C3D features for sequence learning. Alternately run Step 1 and Step 2 until converge. 13

  14. Outline  Background  Contribution  Proposed Architecture  Iterative Optimization  Experimental Results  Conclusions 14

  15. Experiments  Dataset and Evaluation ◼ Continuous SLR Dataset: RWTH-PHOENIX-Weather ◼ Evaluation Metric: Word Error Rate (WER)  3D-ResNet Setups and Initialization ◼ Image crops: 224x224 ◼ Sliding window: length 8, step 4 (50% overlap) ◼ Pre-trained on an isolated Chinese SLR dataset Batch size 5, learning rate 0.001, weight decay 5 × 10 −5 ◼ ◼ Pooling-5b activations for clip representation  Dilated Convolutional Network Setups ◼ Dilations for each layer: 1, 2, 4, 8, 16 ◼ Size of blocks: 5 15

  16. Experimental Results  Iterative Results  Comparison 16

  17. Experimental Results  An example for iterative optimization 17

  18. Conclusions  A novel framework with dilated convolutions for continuous sign language recognition.  An iterative optimization strategy to train the proposed architecture by generating pseudo labels.  Performs well both in accuracy and speed. 18

Recommend


More recommend