of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May - PowerPoint PPT Presentation

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1

Agenda • Introduction & Motivation • Dataset description • Model • Training • Inference • Results 2

Context and Recognition Human visual system uses context for recognition 3

Human Object Interaction (HOI) 4

Human Poses and Objects Human pose Unusual part estimation is appearances challenging. Self occlusion Patch looks like body part 5

Human Poses and Objects Given the object is detected. 6

Human Poses and Objects Object detection is challenging Small, low- resolution, partially occluded Image region similar to detection target 7

Human Poses and Objects Given the pose is estimated. 8

Datasets - Sports Images of six sports activities 9

Datasets - PPMI People interacting with 12 classes of musical instruments 10

Atomic poses – pose dictionary 11

Mutual Context Model • Goal: Estimate the human pose and detect the objects that the human interacts with – Occluded or small objects – Articulated human poses – variation of poses in one class of activity • Conditional random field model • Human interacting with any number of objects 12

Model y ( A , O , H , I ) = f 1 ( A , O , H ) + f 2 ( O , H ) Activity Co-occurrence context Spatial context A + f 3 ( O , I ) + f 4 ( H , I ) + f 5 ( A , I ) Human pose H Objects O M O 1 Modeling objects Modeling activity Body parts Modeling human pose P 1 P 2 P L I Image of human-object interaction 13

Model: Co-occurrence Context Activity A Compatibility between actions, objects, and Human pose human poses H Objects O M O 1 Body parts f 1 ( A , O , H ) = P 1 P 2 P L N b N o N a M 1( H = h i ).1( O m = o j ).1( A = a k ) z i , j , k å å å å i = 1 m = 1 j = 1 k = 1 I Image of human-object interaction 14

Model: Co-occurrence Context f 1 ( A , O , H ) = N h N o N a M 1( H = h i ).1( O m = o j ).1( A = a k ) z i , j , k å å å å i = 1 m = 1 j = 1 k = 1 N h : total number of atomic poses h i : the i th atomic pose N o : total number of objects o j : the j th object N a : total number of activates a k : the k th activity ζ i,j,k : strength of the co-occurrence interaction 15

Model: Spatial Context Activity A Spatial relationship between object and Human pose different body parts of the human H Objects O M O 1 Body parts f 2 ( H , O ) = P 1 P 2 P L N h N o M L 1( H = h i ).1( O m = o j ). l i , j , l å å å å T l , O m ) . b ( X I m = 1 i = 1 j = 1 l = 1 I Image of human-object interaction 16

Model: Spatial Context f 2 ( H , O ) = N h N o M L 1( H = h i ).1( O m = o j ). l i , j , l å å å å T l , O m ) . b ( X I m = 1 i = 1 j = 1 l = 1 l : location of the center of human’s l th body part in image I x I l and the m th object l m ): spatial relationship between x I b(x I , O bounding box  sparse binary vector with one 1 λ i,j,l : Weight for the relationship 17

Model: Objects Modeling objects using the detection scores Activity in all the object bounding boxes and the A spatial relationship between these boxes. Human pose H Objects f 3 ( O , I ) = O M O 1 N o M 1( O m = o j ). g j å å T . g ( O m ) + Body parts m = 1 j = 1 P 1 P 2 P L N o M M L 1( O m = o j ).1( O m = o ¢ å å å å T . b ( O m , O m ) ¢ ¢ j ). g j , ¢ j m = 1 m = 1 ¢ j = 1 j = 1 ¢ I 18

Model: Objects f 3 ( O , I ) = N o M 1( O m = o j ). g j å å T . g ( O m ) + m = 1 j = 1 N o M M L 1( O m = o j ).1( O m = o ¢ å å å å T . b ( O m , O m ) ¢ ¢ j ). g j , ¢ j m = 1 m = 1 ¢ j = 1 j = 1 ¢ g(O m ) : vector of scores of all detected object in the m th box ϒ j : the detection score weight for the j th object b(O m, O m’ ) : binary vector of spatial relationship between pairs of objects ϒ j,j ’ : weight for geometric configuration between o j and o j ’ [Desai et al, 2009] 19

Model: Human Pose Likelihood of observing image I given the Activity atomic pose h i A Human pose H f 4 ( H , I ) = O M O 1 N h L å å T . p ( X I l | X h i l )) + Body parts 1( H = h i ).( a i , l b i , l T . f l ( I )) i = 1 l = 1 P 1 P 2 P L I Image of human-object interaction 20

Model: Human Pose f 4 ( H , I ) = N h L å å T . p ( X I l | X h i l )) + 1( H = h i ).( a i , l b i , l T . f l ( I )) i = 1 l = 1 l | x hi l ) : Gaussian likelihood of observing x I l , given the standard joint p(x I location of the l th body part in pose h i f l (I) : the l th body part detection output α j,l : location weight for the l th body part in pose h i β j,l : appearance weight for the l th body part in pose h i 21

Model: Activities Activity classifier to model HOI activity Activity A f 5 ( A , I ) = Human pose H N o å Objects 1( A = a k ). h k b i , l T . T . s ( I )) O M O 1 k = 1 Body parts P 1 P 2 P L I Image of human-object interaction 22

Model: Activities f 5 ( A , I ) = N o å 1( A = a k ). h k b i , l T . T . s ( I )) k = 1 η k : feature weight for activity a k s(I) : output of one-versus-all discriminative classifier 23

Training: Atomic Poses Hierarchical clustering from a given set of poses on training images: • Position and orientation of parts with distance • Normalization to the same position/size of torso (sports) or head (music) • Variations in position and orientation are normalized to [-1,1] • Missing parts are filled from the image’s nearest neighbor • Atomic poses are shared by all activities w 𝑈 ⋅ ∣ x 𝑚 − x 𝑚 ∣ 24

Training: Objects and Part Detectors Deformable Parts Model with SVM on HOG feature detectors: • One mixture component per per body part • Two mixture components per object unless aspect ratios do not change • - value of the object detection score divided by the threshold • - value of the body part detection divided by the threshold 25

Training: Activity Classifier Spatial Pyramid Matching method: • Sparse SIFT features on three layers • - a vector with confidence scores obtained from an SVM classifier 26

Training: Estimating Model Parameters Conditional Random Field with no hidden variables: • - model parameters • Maximum likelihood approach • Zero-mean Gaussians priors 27

Inference: Iterative Process Initialization : • Action classification with SPM classification • Object bounding boxes from independent object detectors (scores >0.9) • Initial pose from a pictorial structure model from all training images Two Iterations : • Updating the layout of human body parts - updating Gaussian priors for part locations with poses marginal probabilities: • Updating object detection results - greedy forward search: • Updating the activity and atomic pose labels - maximizing the overall sum by enumerating all possible values for actions and human poses 28

Results: Examples for Testing Images 29

Results: Sports – Object Detection • Better overall performance across all objects • Better discrimination of similar objects (cricket ball vs. croquet ball) 30

Results: Sports – Human Pose Estimation • Better overall performance across all poses • Outperform even Pictorial Structure model trained on separate classes! 31

Results: Sports – Activity Classification • Better overall performance • Performance is better than just SPM by about 4% 32

Results: Music – Object Detection • Better overall performance across all objects • Better improvement for “playing instrument” situations when context plays a more important role 33

Results: Music – Object Detection • Demonstration of the importance of human poses for object detection 34

Results: Music – Human Pose Estimation • Better performance for poses with “playing instrument” • Only marginally better for poses with “not playing instrument” • No significant improvement as compared to Pictorial Structure model 35

Results: Music – Activity Classification • Better overall performance as compared to SPM and grouplet approach 36

of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May - PowerPoint PPT Presentation

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1 Agenda Introduction & Motivation Dataset description Model

Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos Marcel Sheeny de

Human Pose Es,ma,on Greg Mori CMPT 888 Problem Human

and Deformable Objects in Hand-object Interactions Hao Zhang, Zi-Hao Bo, Jun-Hai Yong, Feng Xu *

Controlling objects using signals emanating from a human brain Stevo Bozinovski, Liljana

Detecting Potential Falling Objects by Inferring Human Action and Natural Disturbance Bo Zheng*,

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

How do objects express concepts about the human condition? Felt, meat bones and piles of

HUMAN GRASPING Can robots grasp as well? DATA-DRIVEN GRASPING OF UNKNOWN OBJECTS Arsalan

61A Lecture 12 Announcements Objects (Demo) Objects Objects represent information They

Objects & Inheritance Section 7 Implementing Objects in 401 Ways of implementing objects:

Transforming Objects Ray : R(t) = s + c t Objects : Sphere, box, cone etc. We assume the objects

Mutable Values Announcements Objects (Demo) Objects Objects represent information They

Objects (Demo1) Objects Objects represent information They consist of data and behavior,

Manipulation of 1D and 2D Deformable Objects Without Modeling Deformation Dmitry Berenson

' ' ' ' DIE FACE-6 EIGHTH NOTE (Demo) 7 Some Objects Can Change Mutation Can Happen

of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Objects:

Why we need to tamper-proof our society Data manipulation poses a more systemic threat than theft

Review Objects Classes Objects and Arrays Models of Motion with Objects Linear

QUESTION: How could our conscious experiences be made out of physical stuff? Consciousness poses

By a set we mean any collection of objects that are precisely spec- ified. These objects are called

Poses and Motion: Representations of Motion and Kinematics of Rigid Bodies The Heart of

Using Geometry to Detect Grasp Poses in 3D Point Clouds ten Pas, Platt Northeastern University

Objects and Meaning Unit Plan: Comfort and Objects Kelly Junis ART 333- Curriculum Development

of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May - PowerPoint PPT Presentation

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1 Agenda Introduction & Motivation Dataset description Model

Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos Marcel Sheeny de

Human Pose Es,ma,on Greg Mori CMPT 888 Problem Human

and Deformable Objects in Hand-object Interactions Hao Zhang, Zi-Hao Bo, Jun-Hai Yong, Feng Xu *

Controlling objects using signals emanating from a human brain Stevo Bozinovski, Liljana

Detecting Potential Falling Objects by Inferring Human Action and Natural Disturbance Bo Zheng*,

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

How do objects express concepts about the human condition? Felt, meat bones and piles of

HUMAN GRASPING Can robots grasp as well? DATA-DRIVEN GRASPING OF UNKNOWN OBJECTS Arsalan

61A Lecture 12 Announcements Objects (Demo) Objects Objects represent information They

Objects &amp; Inheritance Section 7 Implementing Objects in 401 Ways of implementing objects:

Transforming Objects Ray : R(t) = s + c t Objects : Sphere, box, cone etc. We assume the objects

Mutable Values Announcements Objects (Demo) Objects Objects represent information They

Objects (Demo1) Objects Objects represent information They consist of data and behavior,

Manipulation of 1D and 2D Deformable Objects Without Modeling Deformation Dmitry Berenson

' ' ' ' DIE FACE-6 EIGHTH NOTE (Demo) 7 Some Objects Can Change Mutation Can Happen

of human actions Ivan Laptev ivan.laptev@inria.fr WILLOW, INRIA/ENS/CNRS, Paris Objects:

Why we need to tamper-proof our society Data manipulation poses a more systemic threat than theft

Review Objects Classes Objects and Arrays Models of Motion with Objects Linear

QUESTION: How could our conscious experiences be made out of physical stuff? Consciousness poses

By a set we mean any collection of objects that are precisely spec- ified. These objects are called

Poses and Motion: Representations of Motion and Kinematics of Rigid Bodies The Heart of

Using Geometry to Detect Grasp Poses in 3D Point Clouds ten Pas, Platt Northeastern University

Objects and Meaning Unit Plan: Comfort and Objects Kelly Junis ART 333- Curriculum Development

Objects & Inheritance Section 7 Implementing Objects in 401 Ways of implementing objects: