bil 722 advanced
play

BIL 722: Advanced Topics in Computer Vision Mehmet Kerim Y cel - PowerPoint PPT Presentation

BIL 722: Advanced Topics in Computer Vision Mehmet Kerim Y cel Deep Structured Models For Group Activity Recognition Deng et al. BMVC 2015 Simon Fraser University, SPORTLOGIQ , Canada Overview 25 April 2016 Deep Structured Models For


  1. BIL 722: Advanced Topics in Computer Vision Mehmet Kerim Y ü cel Deep Structured Models For Group Activity Recognition Deng et al. BMVC 2015 Simon Fraser University, SPORTLOGIQ , Canada

  2. Overview 25 April 2016 Deep Structured Models For Group Activity Recognition - Deng, Zhiwei et al., BMVC 2015 - Simon Fraser University, SPORTLOGIQ, Canada Useful links - Paper http://arxiv.org/pdf/1506.04191.pdf - Code / BMVC presentation not available 2 Brunel University London Deep Structured Models For Group Activity Recognition

  3. Overview 25 April 2016 • Individual / group activity recognition problem • Combining atomic action information with their dependencies • Use deep CNNs to learn atomic actions / scene labels • Then refine these labels with NN-made graphical models • State of the art achieved for Collective Activity and Nursing Home (?) Datasets 3 Brunel University London Deep Structured Models For Group Activity Recognition

  4. Overview 25 April 2016 • Major contributions • First to combine CNNs and GM for group activity recognition • Message passing phase created by Neural Nets • Results comparable to state of the art 4 Brunel University London Deep Structured Models For Group Activity Recognition

  5. Literature Review 25 April 2016 • Event understanding is a notorious problem • Need to acquire spot-on info on atomic actions • Such actions include walking, running, waving, etc... • Hand-crafted features (HOG, MBH, improved dense trajectories) in the context of BoW [ Wang, Heng, and Cordelia Schmid ] • Then feed these into a discriminative or generative model • These are swept by DL approaches [ Karpathy, Andrej et al.] [Simonyan, Karen, and Andrew Zisserman] 5 Brunel University London Deep Structured Models For Group Activity Recognition

  6. Literature Review 25 April 2016 • Action Recognition with Improved Trajectories 1 • Improve dense trajectories with camera motion • Remove trajectories consistent with the estimated camera motion • Cancel out camera motion from optical flow 6 Brunel University London Deep Structured Models For Group Activity Recognition

  7. Literature Review 25 April 2016 • Large-scale Video Classification with convolutional neural networks 2 • CNN variants experimented with (taking into account time-domain) 7 Brunel University London Deep Structured Models For Group Activity Recognition

  8. Literature Review 25 April 2016 • Two-stream Convolutional Networks for Action Recognition in Videos 3 • Spatial stream net trained on single frame • Temporal stream net trained on optical flow 8 Brunel University London Deep Structured Models For Group Activity Recognition

  9. Literature Review 25 April 2016 • Event understanding is a notorious problem • We need : interactions of individuals, higher level information • Such interactions and high level activities are suitable for a hierarchical structure • Rich features to capture context; social cues [ Lan, Tian, Leonid Sigal, and Greg Mori ] • Hierarchical Graphical Models [ Amer, Mohamed Rabie, Peng Lei, and Sinisa Todorovic ] • Dynamic Bayesian Networks [ Zhu, Yingying, Nandita Nayak, and Amit Roy-Chowdhury ] 9 Brunel University London Deep Structured Models For Group Activity Recognition

  10. Literature Review 25 April 2016 • Combination of convolutional neural nets with graphical models • Tompson, Jonathan J., et al. one step message passing implemented as convolution operation, incorporating spatial relations between local responses for human body pose estimation • Deng, Jia, et al. relations between predicted labels considered via training a GM on top of a neural net with joint training 10 Brunel University London Deep Structured Models For Group Activity Recognition

  11. Problem Statement & Motivation 25 April 2016 • Motivation of this work • Further the state of the art in group activity recognition • Accurately detect atomic actions/scene labels • Incorporate dependencies between labels for actions/activities • Perform label refinement through a hierarchical structure incorporating said dependencies ... via using a CNN and a HGM based on a neural net that mimics message passing 11 Brunel University London Deep Structured Models For Group Activity Recognition

  12. Graphical Models in a Neural Network 25 April 2016 • Graphical Models... • Defines a joint distribution over states of a set of nodes • Take a factor graph; • Inference done by belief propagation • Belief propagation , at each step of message passing, collects relevant info from connected nodes to a factor node, then passes these messages to variable nodes 12 Brunel University London Deep Structured Models For Group Activity Recognition

  13. Graphical Models in a Neural Network 25 April 2016 • Key point: Mimic message passing using a neural network! • Represent each combination of states as a neuron ( factor neuron ) • Factor neuron can learn dependencies between states and pass messages • Can adopt various neuron types (linear, ReLU, TanH, etc...) • Parameter sharing due to GM integration into NN; reduced free parameters 13 Brunel University London Deep Structured Models For Group Activity Recognition

  14. Graphical Models in a Neural Network 25 April 2016 14 Brunel University London Deep Structured Models For Group Activity Recognition

  15. Message Passing CNN Architecture 25 April 2016 • Key point: Two-stage architecture • First stage: Fine-tuned CNNs that produce scene scores for a frame, and action and pose scores for each person in that frame • Second stage: Message Passing NN that captures label dependencies 15 Brunel University London Deep Structured Models For Group Activity Recognition

  16. Message Passing CNN Architecture 25 April 2016 16 Brunel University London Deep Structured Models For Group Activity Recognition

  17. Message Passing CNN Architecture 25 April 2016 • First stage: Three separate CNNs, for scene, action and pose information • All are fine-tuned using an AlexNet architecture trained on ImageNet • Quite similar architecture, except pooling is done before normalization • Five convolutional layer, two FC layers with softmax output 17 Brunel University London Deep Structured Models For Group Activity Recognition

  18. Message Passing CNN Architecture 25 April 2016 • Second stage: outputs of first stage taken as input • Can contain several steps of message passing • In each step, two types of passes occur: • from outputs of step k-1 to factor layer • from factor layer to k step outputs 18 Brunel University London Deep Structured Models For Group Activity Recognition

  19. Message Passing CNN Architecture 25 April 2016 • Second stage: • In the k th message passing step, the first pass computes dependencies between the states • Inputs to this step; • The first term is the scene score of Image I for label g • The second term is the action score of person I m for label h • The third term is the pose score of person I m for label z 19 Brunel University London Deep Structured Models For Group Activity Recognition

  20. Message Passing CNN Architecture 25 April 2016 • Second stage: • In the factor layer, interactions of pose, action and scene are calculated as; • α g,h,z is 3-d parameter template for combination of scene g, action h and pose z. 20 Brunel University London Deep Structured Models For Group Activity Recognition

  21. Message Passing CNN Architecture 25 April 2016 • Second stage: • Pose actions for all people in the scene are calculated as; • r is all output nodes for all people, t is the factor neuron index for scene g. • T latent neurons are used for a scene g. • Parameters β & α are shared within factors with the same semantic meaning. 21 Brunel University London Deep Structured Models For Group Activity Recognition

  22. Message Passing CNN Architecture 25 April 2016 • Second stage: Output of k th step message passing, score for the scene label g is; • • . is the factor node connected with scene g in scene-action-pose component. is the pose-global factor node. 22 Brunel University London Deep Structured Models For Group Activity Recognition

  23. Message Passing CNN Architecture 25 April 2016 • Second stage: Output of k th step message passing, action score is; • • Pose score; • Model parameters are weights on the edges of NN. • Concatenation of weights from factor layer to output ( W ) (2 nd pass) β & α are weights from inputs to factor layer (1 st pass) • 23 Brunel University London Deep Structured Models For Group Activity Recognition

  24. Components in Factor Layers 25 April 2016 • Unary component • Group activity scores for an image I, action and pose scores for each person I m in frame I • Acquired from previous message passing step and added to the output of next step 24 Brunel University London Deep Structured Models For Group Activity Recognition

  25. Components in Factor Layers 25 April 2016 • Group activity-action-pose layer ϕ • Measure the compatibility between individuals and groups • Capture dependencies between a person’s fine -grained action and the scene label 25 Brunel University London Deep Structured Models For Group Activity Recognition

Recommend


More recommend