inference networks graph convolutional networks
play

Inference Networks, Graph Convolutional Networks Greg Mori School - PowerPoint PPT Presentation

Inference Networks, Graph Convolutional Networks Greg Mori School of Computing Science Simon Fraser University Outline Scene Image annotation with label hierarchies outdoor outdoor indoor man-made natural house sports man-made


  1. Inference Networks, Graph Convolutional Networks Greg Mori School of Computing Science Simon Fraser University

  2. Outline Scene • Image annotation with label hierarchies outdoor outdoor indoor man-made natural house sports man-made leisure cabins construction field elements farms • Hu et al. CVPR 2016 arena batter trech pitcher play- barn hockey mound box ground base floor grass people field bat building ball Walking? Waiting? • Message passing with deep structured networks Walking Waiting • Deng et al. BMVC 2015, CVPR 2016

  3. Image Classification • A natural image can be categorized with labels at different concept layers outdoor outdoor Indoor man-made natural man-made cabins leisure sports elements houses field pitcher batter’s play- barn trench mound box ground bat base person building field grass ball Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

  4. Label Correlation Helps • Such categorization at different concept layers can be modeled with label graphs • It is natural and straightforward to leverage label outdoor outdoor Indoor correlation man- natural made cabins leisure sports man- houses field made elements pitcher batter’s play- barn trench mound box ground Positive correlation base bat person building field grass Negative correlation ball

  5. Goal: A generic label relation model • Infer the entire label space from visual input • Infer missing labels given a few fixed provided labels Metadata or Activations from partial human labels Label: Outdoor Man-made Reverse Sigmoid Inference Partial Label Machine sports field Initial Label: Outdoor Man-made Reverse Sigmoid Activations from partial human labels on Activations from partial human labels Label: Outdoor Man-made Reverse Sigmoid Refined Activation Label: Outdoor Man-made Reverse Sigmoid Activations from partial human labels Activations from partial human labels Activations from partial human labels Label: Outdoor Man-made Label: Outdoor Man-made Reverse Sigmoid Reverse Sigmoid Knowledge sports field Probability Visual sports field CNN batter box sports field Graph Architecture CNN batter box sports field sports field CNN batter box CNN batter box (CNN) baseball, bat, baseball, bat, people, field people, field baseball, bat, people, field Information Propagation baseball, bat, Visual Activation Output Activation Prediction people, field CNN CNN Information Propagation Visual Activation Output Activation batter box Prediction batter box SINN Prediction with Partial Human Labels Visual Activation Information Propagation Output Activation Prediction Information Propagation An End-to-end Trainable SINN Prediction with Partial Human Labels Visual Activation Output Activation Prediction SINN Prediction with Partial Human Labels SINN Prediction with Partial Human Labels Back-propagate Gradient from Loss Function System baseball, bat, baseball, bat, people, field people, field Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016 Information Propagation Visual Activation Information Propagation Output Activation Prediction Visual Activation Output Activation Prediction SINN Prediction with Partial Human Labels SINN Prediction with Partial Human Labels

  6. Top-down Inference Neural Network • Refine activations for each label Produce initial visual activation • Pass messages top-down and from CNN within each layer of label graph x i t = W t · CNN ( I i ) + b t Top-down inference Visual Horizontal weight Activation Vertical weight Architect propagates at current propagates ure information concept information within concept layer across concept layers layers a i t = V t − 1 ,t · a i t − 1 + H t · x i t + b t Activation at last concept layer Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

  7. Bidirectional Inference Neural Network (BINN) • Bidirectional inference to make information propagate across entire label structure • Inference in each direction independently and blend results Bidirectional inference t + − → t = − → t − 1 + − → − → V t − 1 ,t · − → Visual a i a i H t · x i b t , t = ← − t +1 + ← − t + ← − Architect ← − V t +1 ,t · ← − a i a i H t · x i b t , ure t = − → t + ← − U t · − → U t · ← − a i a i a i t + b t

  8. Structured Inference Neural Network (SINN) • BINN is hard to train Class Attributes • Regularize connections with Cat Domestic prior knowledge about label Positive Correlation Zebra Spotted correlations • Decompose connections Leopard Striped into Positive correlation + Hound Fast Negative Correlation Negative correlation Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

  9. Structured Inference Neural Network (SINN) • Evolve BINN formulation with regularization in connections Negative Positive Component Component Structured inference t = γ ( − → t − 1 ) + γ ( − → − → t − 1 ,t · − → a i V + a i H + t · x i t ) Visual − γ ( − → t − 1 ) − γ ( − → t ) + − → t − 1 ,t · − → a i t · x i ReLU V − H − b t , Architect t = γ ( ← − t +1 ) + γ ( ← − neuron is ← − t +1 ,t · ← − a i V + a i H + t · x i t ) ure essential to − γ ( ← − t +1 ) − γ ( ← − t ) + ← − t +1 ,t · ← − a i t · x i V − H − b t , keep t = − → t + ← − a i U t · − → a i U t · ← − a i t + b t positive/neg ative contribution γ ( x ) = ReLU ( x )

  10. Prediction from Purely Visual Input • Visual architecture (e.g. Convolutional Neural Network) produces visual activation • SINN implements information propagation bidirectionally and produces refined output activation outdoor manmade sports field CNN batter box bat, people, water Prediction Visual Activation Information Propagation Output Activation SINN Prediction Hu, Deng, Zhou, Liao, Learning Structured Inference Neural Networks with Label Relations, CVPR 2016

  11. Prediction with Partially Observed Labels • Reverse Sigmoid (logit) neuron produces activation from Partial labels • SINN adapts both visual activation and activation from partial labels to infer the remaining labels Label: Outdoor Man-made Reverse Sigmoid Activations from partial human labels sports field CNN batter box baseball, bat, people, field Visual Activation Information Propagation Output Activation Prediction SINN Prediction with Partial Human Labels

  12. Reverse sigmoid (logit): produce activation from label • Reverse the sigmoid function to produce sigmoid input Inverse of sigmoid 1 y = σ ( x ) = 1 + exp − x 1 a ( y ) = log 1 − g ( y ) , ( y + ✏ , if y = 0 , g ( y ) = y − ✏ , if y = 1 . Use a small epsilon to keep numerical stability (0.005)

  13. Image Datasets • Evaluate with two types of experiments on three datasets SUN 397 Animals with Attributes NUS-WIDE [Xiao et al. 2012] [Lampert et al. 2009] [Chua et al. 2009] Labels Labels Labels 3 coarse 698 image 28 taxonomy 16 general groups terms 81 concepts 397 fine- 50 animal classes 1000 tags grained 85 attributes Task : predict 81 concepts Task 1 : predict entire Task : predict entire label with observing label set set tags/image groups Task 2 : predict fine- • Taxonomy terms are grained scene given Knowledge graph produced by • constructed from Word Net Word Net using semantic as [Hwang et al. 2012] coarse scene category similarity • Knowledge graph constructed Knowledge graph provided by 698 image groups constructed • • by combining class-attributes from image meta data dataset graph with taxonomy graph

  14. Ex1: Inference from visual input • Produce predictions on entire label space • Evaluate on each concept layer (measured by mAP per class) • Consistent improvement over baselines on different concept layers Animal With Attributes SUN 397 100 100 95 90 90 80 85 70 80 60 75 50 28 Taxonomy 50 Animal 85 Attributes 3 Coarse Scene 16 General Scene 397 Fine-grained Terms Classes Categories Categories Scene Categories CNN + Logistics CNN + BINN CNN + SINN CNN + Logistics CNN + BINN CNN + SINN

  15. Ex2: Inference from partial labels (NUS-WIDE) • Produce predictions given partial 1k tags and 698 image groups Ground Truth : railroad Ground Truth : food water Ground Truth : animal grass Ground Truth : rainbow CNN + Logistic : statue CNN + Logistic : food water dog clouds sky buildings person plants flower CNN + Logistic : grass CNN + Logistic : clouds Our Predictions : railroad Our Predictions : food person animal water sky person sky plants water Our Predictions : water Our Predictions : rainbow animal dog clouds sky Correct predictions are marked in blue while incorrect are marked in red

Recommend


More recommend