outline
play

Outline Introduction Convolutional neural networks (CNN) The - PowerPoint PPT Presentation

23rd International Conference on MultiMedia Modeling (MMM 2017) On the Exploration of Convolutional Fusion Networks for Visual Recognition Yu Liu, Yanming Guo, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University


  1. 23rd International Conference on MultiMedia Modeling (MMM 2017) On the Exploration of Convolutional Fusion Networks for Visual Recognition Yu Liu, Yanming Guo, and Michael S. Lew Leiden Institute of Advanced Computer Science, Leiden University Presenter: Yu Liu Discover the world at Leiden University

  2. Outline • Introduction • Convolutional neural networks (CNN) • The usage of intermediate layers • Multi-layer fusion • Motivation • How to develop an efficient multi-layer fusion network • Our approach • Convolutional fusion networks (CFN) • Results • Image-level and pixel-level classification • Conclusions Discover the world at Leiden University

  3. Outline • Introduction • Convolutional neural networks (CNN) • The usage of intermediate layers • Multi-layer fusion • Motivation • How to develop an efficient multi-layer fusion network • Our approach • Convolutional fusion networks (CFN) • Results • Image-level and pixel-level classification • Conclusions Discover the world at Leiden University

  4. Introduction: CNNs • A plain CNN … FC 1 × 1 Conv Conv S -1 Conv S Conv 2 GAP Conv 1 Prediction … Pooling Pooling Pooling Conv: convolutional layer Pooling: max or average pooling layer 1x1 Conv: use 1x1 kernel size GAP: global average pooling FC: fully-connected layer This pipeline of CNN becomes widely used in recent works, because it can reduce a large number of parameters. Discover the world at Leiden University

  5. Introduction: CNNs • A plain CNN … FC 1 × 1 Conv Conv S -1 Conv S Conv 2 GAP Conv 1 Prediction … Pooling Pooling Pooling A plain CNN estimates a final prediction based on the topmost layer. If useful information in intermediate layers are lost during forward propagation? Can we develop a fusion architecture which exploits the intermediate layers? Discover the world at Leiden University

  6. Introduction: intermediate layers • Apart from fully-connected layers, intermediate convolutional layers can also offer discriminative representations. Input image Feature encoder Output vector Feature extractor CNN … BoW, VLAD, … Fisher Vector, et al. Encoder Method BoW DeepIndex (ICMR2015), BLCF (ICMR2016), MSCE (IJCV2016) VLAD MOP-CNN (ECCV2014), NetVLAD (CVPR2016), CCS (MM2016) Fisher Vector DSP (ICCV2015), MPP (CVPR2015), FV-CNN (CVPR2015) Other encoders SCFVC (NIPS2014), SPoC (ICCV2015), SPLeaP (ECCV2016) Discover the world at Leiden University

  7. Introduction: multi-layer fusion • To integrate the strengths of different layers, aggregate multi-layer activations and generate a richer representation. References: Lingqiao Liu, Chunhua Shen, Anton van den Hengel. “The treasure beneath convolutional layers: cross convolutional 1. layer pooling for image classification”, CVPR, 2015. Ying Li, Xiangwei Kong, Liang Zheng, Qi Tian. “Exploiting Hierarchical Activations of Neural Network for Image 2. Retrieval”, ACM Multimedia 2016. Discover the world at Leiden University

  8. Introduction: multi-layer fusion • To integrate the strengths of different layers, aggregate multi-layer activations and generate a richer representation. • However, these works use a pre-trained model without improving the training procedure. Discover the world at Leiden University

  9. Introduction: multi-layer fusion • Add new side branches and train them jointly with the full-depth main branch. DAG-CNNs Figure from “Yang , S., Ramanan, D.: Multi-scale recognition with DAG-CNNs, ICCV 2015.” Discover the world at Leiden University

  10. Introduction: multi-layer fusion • Add new side branches and train them jointly with the full-depth main branch. DAG-CNNs Figure from “Yang , S., Ramanan, D.: Multi-scale recognition with DAG-CNNs, ICCV 2015.” • This approach spends a large number of additional parameters for developing side branches (i.e. fully-connected layers). • The summation operation ignores different importance of side branches. Discover the world at Leiden University

  11. Outline • Introduction • Convolutional neural networks (CNN) • The usage of intermediate layers • Multi-layer fusion • Motivation • How to develop an efficient multi-layer fusion network • Our approach • Convolutional fusion networks (CFN) • Results • Image-level and pixel-level classification • Conclusions Discover the world at Leiden University

  12. Motivation • Question: How to exploit an efficient multi-layer fusion network built upon CNNs ? • Three key issues • Efficiency: adding few parameters in the side branches. • Better fusion module: learn adaptive weights for different side branches. • Accuracy: considerable improvements over a plain CNN. Discover the world at Leiden University

  13. Outline • Introduction • Convolutional neural networks (CNN) • The usage of intermediate layers • Multi-layer fusion • Motivation • How to develop an efficient multi-layer fusion network • Our approach • Convolutional fusion networks (CFN) • Results • Image-level and pixel-level classification • Conclusion & Discussion Discover the world at Leiden University

  14. Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Discover the world at Leiden University

  15. Efficient side outputs … 1 × 1 Conv GAP Conv 2 Conv S -1 Conv S Conv 1 … Pooling Pooling Pooling Side branch S ( Main branch ) Side branch S -1 …… … Side branch 2 Side branch 1 1. Creating the side branches from the pooling layers. 2. Employing the 1x1 convolution to “receive” the side -branch inputs. 3. Performing an efficient global average pooling (GAP) to “send” the side -branch outputs. CFN has a minimal increase in parameters for the side branches. Discover the world at Leiden University

  16. Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Advantage II : Early fusion and late prediction Discover the world at Leiden University

  17. Early fusion and late prediction … 1 × 1 Conv GAP Conv 2 Conv S -1 Conv S Conv 1 … Pooling Pooling Pooling Side branch S ( Main branch ) FC Fusion module Side branch S -1 Prediction …… … Side branch 2 Side branch 1 1. Using a fusion module to integrate the side-branch outputs. 2. The fused feature is fed to a fully-connected layer to make a final prediction. Discover the world at Leiden University

  18. Comparison Early fusion and late prediction (EFLP) Early prediction and late fusion (EPLF) The advantages of EFLP: • competitive performance with EPLF. • fewer parameters (i.e. use one FC layer) than EPLF. • the fused feature can act as a richer image representation. Discover the world at Leiden University

  19. Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Advantage II : Early fusion and late prediction Advantage III: Locally-connected fusion Discover the world at Leiden University

  20. Locally-connected (LC) fusion 1 × 1 Conv GAP Fusion module branch 1 Locally Stack FC connected branch 2 Prediction … … … S branch S -1 branch S • The side outputs are first stacked together. • A locally-connected layer with 1x1 kernel size is performed over the stacked maps. LC layer can learn adaptive weights for different side outputs. Discover the world at Leiden University

  21. Comparison • Since a locally-connected layer does not share the weights over spatial dimensions, it can learn better fusion than other fusion modules. • To the best of our knowledge, this is the first attempt to apply a locally- connected layer to a fusion module. Discover the world at Leiden University

  22. Our approach: CFN • Overall Architecture Advantage I : Efficient side branches Advantage II : Early fusion and late prediction Advantage III: Locally-connected fusion Discover the world at Leiden University

  23. Our approach: CFN • Overall Architecture • CFN can integrate the intermediate layers using additional side branches, and deliver their effects on the final prediction explicitly and directly. Discover the world at Leiden University

  24. Discussion (1) Difference from DSN • Deeply-supervised nets (DSN) add extra supervision to guide “Loss fusion” intermediate layers earlier. • However, CFN aims to generate a fused and richer feature and uses only one supervision towards the final prediction. “Feature fusion” References: Lee, C., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: AISTATS 2015. Discover the world at Leiden University

  25. Discussion (2) Difference from ResNet • ResNet makes use of “linear” shortcut connections to make “ Depth that matters ” much deeper neural networks work well. • However, CFN exploits existing intermediate layers to improve the discriminative capability of CNNs. “ Fusion that matters ” References: He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR 2016. Discover the world at Leiden University

  26. Outline • Introduction • Convolutional neural networks (CNN) • The usage of intermediate layers • Multi-layer fusion • Motivation • How to develop an efficient multi-layer fusion network • Our approach • Convolutional fusion networks (CFN) • Results • Image-level and pixel-level classification • Conclusions Discover the world at Leiden University

Recommend


More recommend