learning human interaction by l i h i i b interactive
play

Learning Human Interaction by L i H I i b Interactive Phrases - PowerPoint PPT Presentation

Learning Human Interaction by L i H I i b Interactive Phrases Interactive Phrases Yu Kong 1,3 , Yunde Jia 1 and Yun Fu 2 1 3 i 1 2 d d 1 Beijing Institute of Technology 2 Northeastern University 3 University at Buffalo Activity


  1. Learning Human Interaction by L i H I i b Interactive Phrases Interactive Phrases Yu Kong 1,3 , Yunde Jia 1 and Yun Fu 2 1 3 i 1 2 d d 1 Beijing Institute of Technology 2 Northeastern University 3 University at Buffalo

  2. Activity landscape Individual action Interaction Group action Crowd action One person Few people Several people Crowd of people Number of people Identification of each person Id ifi i f h Easy Easy Not accurate but we can Very challenging, open problem open problem Our work

  3. Objective: recognizing human interactions from videos videos. Interaction: Boxing Applications Group activity Motion analysis Detect unusual behavior understanding Judge sports automatically Scene analysis Smart surveillance Video game interfaces Smart surveillance

  4. An interaction is determined Motivation Motivation b i di id by individual l actions. Recognize interaction by action co-occurrence Recognize interaction by action co-occurrence Action co occurrence Action co-occurrence Attack-Protect head Attack-Dodge Attack-Hit back Interaction: Boxing P Problem: co-occurrence relationships are not expressive bl l ti hi t i enough to deal with interactions with large variations.

  5. Motivation Motivation We introduce interactive phrases to describe human interactions. describe criptions • Int. b/w still arms b/ ill NO • Int. b/w a chest-level moving arm and a tilting upward arm YES Des • Int. b/w a still torso and a bending torso YES • Int. b/w leaning forward torsos NO Human interaction: Boxing recognize Interactive phrases: • More expressive to describe complicated human interactions. • Binary motion relationships between people. E.g., relationships between arms, legs, and torsos, etc. • Mid-level feature learned from data

  6. Flowchart of our method Video Low-level feature Motion attribute Interactive phrases Interaction Feature extraction Attribute model Interaction model Build individual action Detect individual Learn interactive phrases representation motion attribute and recognize interaction

  7. Individual action representation Individual action representation [d1, d2, …,dn] [ , , , ] 0.25 0.2 0 15 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Low-level local feature Learned dictionary

  8. Attribute model Objective: Jointly detect individual motion attributes. Motion attributes: describe individual motion, e.g. arm stretching out, leg stepping forward, etc. id attributes a m 9 still leg 1 still arm 10 leg stepping forward motion 2 hand stretching out motion 11 leg kicking motion 3 arm chest‐level motion 12 leg stepping back motion 4 two arms chest‐level motion 13 still torso 5 arm raising up motion 14 torso leaning back motion 6 arm embracing motion 15 torso leaning forward motion 16 7 arm free swinging motion torso bending motion 8 17 friendly motion arm intense motion

  9. Individual attribute detection +1 present +1, present 0.25 0.2 0.15 j-th attribute 0.1 a j =1 0.05 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Repeat j=1…M -1, absent , abse t Attribute detector

  10. J i tl d t Jointly detect individual motion attributes. t i di id l ti tt ib t Infer the optimal configuration of attributes (a 1 …a M ) a 3 a 4 Unary attribute potential Pairwise attribute potential a 1 a 5 a Score attribute label S ib l b l S Score pairwise attribute i i ib from feature relationship a 2 a 6 Attribute graph

  11. Attribute model Motion attribute id attributes a m 1 1 still arm till 2 hand stretching out motion 3 arm chest‐level motion 1 4 two arms chest‐level motion 5 arm raising up motion 6 arm embracing motion 7 arm free swinging motion 8 arm intense motion 9 still leg 1 10 leg stepping forward motion 11 11 leg kicking motion leg kicking motion 12 leg stepping back motion 13 still torso 14 torso leaning back motion 15 torso leaning forward motion 1 16 torso bending motion 17 friendly motion

  12. Interaction model Interaction model Objective: learn interactive phrases and infer interaction class Interactive phrases: motion relationships between people, e.g. relationships between arms, legs, torsos, etc.

  13. Interactive phrases Interactive phrases Attributes id f id of associated i t d Person 1 Person 2 id interactive phrases p j attributes a j(1) ,a j(2) Still arm Still arm 1 b/w still arms 1,1 2 b/w a chest‐level moving arm and a free swinging arm 3,7 3 b/w outstretched hands / 2,2 , 4 b/w raising up arms 5,5 5 b/w embracing arms 6,6 6 b/w a chest‐level moving arm and a still arm 3,1 7 b/w two chest‐level moving arms and a free swinging arm 4,7 8 b/w free swinging arms 7,7 9 b/w intense moving arms 8,8 10 b/w a chest‐level moving arm and a leaning backward torso 3,14 11 b/w two chest‐level moving arms and a leaning backward torso 4,14 12 12 b/w still legs b/w still legs 9,9 9 9 13 b/w a stepping forward leg and a stepping backward leg 10,12 Stepping Still 14 b/w stepping forward legs 10,10 15 b/w a stepping forward leg and a still leg 10,9 forward leg/stepping 16 b/w a kicking leg and a stepping backward leg 11,12 l leg/still leg / till l f forward leg d l 17 b/w a bending torso and a still torso 16,13 18 b/w a leaning forward torso and a leaning backward torso 15,14 19 b/w leaning forward torsos 15,15 20 b/w leaning backward torsos 14,14 21 b/w a leaning forward torso and a still torso 21 b/ l i f d d ill 15 13 15,13 22 b/w still torsos 13,13 23 cooperative interaction 17,17

  14. Interactive phrases Interactive phrases Latent variable, learned from data mid-level feature, used for inferring interaction class

  15. Experiments p • BIT-Interaction dataset – 8 classes 400 videos 8 classes, 400 videos bow boxing handshake high ‐ five hug kick pat push • UT-Interaction dataset – 6 classes, 60 videos 6 l 60 id handshake hug kick point punch push

  16. Results on BIT-Interaction dataset • 8 interaction classes, 400 videos, 23 interactive phrases, 17 motion attributes Classification examples of our method Confusion matrix of our method Accuracy = 85.16%

  17. Results on BIT Interaction dataset Results on BIT-Interaction dataset

  18. Results on BIT Interaction dataset Results on BIT-Interaction dataset Comparison results of accuracy (%) Recognition accuracy (%) of methods 100 90 80 70 bag ‐ of ‐ words 60 50 no ‐ phrase method p 40 no ‐ IPC method 30 no ‐ AC method 20 Our method 10 0 0 No ‐ phrase method: remove phrase layer from the full model No ‐ IPC method: remove phrase connection component from the full model No ‐ AC method: remove attribute connection component from the full model p

  19. Results on UT-Interaction dataset Results on UT Interaction dataset • 6 interaction classes, 60 videos, 23 interactive phrases, 16 motion attributes Confusion matrix of our method Classification examples of our method Accuracy = 88.33%

  20. Results on UT Interaction dataset Results on UT-Interaction dataset Recognition accuracy (%) of methods Recognition accuracy (%) of methods 100 90 80 bag ‐ of ‐ words 70 no ‐ phrase method 60 no ‐ AC method 50 50 no ‐ IPC method 40 Ryoo & Aggarwal (ICCV 2009) 30 Yu et al. (BMVC 2010) 20 10 Ryoo (ICCV 2011) y ( ) 0 Our method [1] [1] [2] [3]

  21. Thank you! Please email yukong@ece.neu.edu if you have any questions if you have any questions.

Recommend


More recommend