people tracking and pose es5ma5on by graph decomposi5on
play

People Tracking and Pose Es5ma5on by Graph Decomposi5on Siyu Tang - PowerPoint PPT Presentation

People Tracking and Pose Es5ma5on by Graph Decomposi5on Siyu Tang Holis&c Vision Group Max Planck Ins&tute for Intelligent Systems 04 July 2018 Overview Graph Decomposition and Multicut Multi-person Tracking Minimum cost


  1. People Tracking and Pose Es5ma5on by Graph Decomposi5on Siyu Tang Holis&c Vision Group Max Planck Ins&tute for Intelligent Systems 04 July 2018

  2. Overview • Graph Decomposition and Multicut • Multi-person Tracking Minimum cost lifted multicut problem ‣ • Multi-person Pose Estimation End-to-end Learning for Graph Decompostion ‣ � 2

  3. Decomposi5on and Mul5cut • A decomposition of a graph is a partition of the node set into connected components. • The set of edges that straddle distinct components are precisely the multicut of the graph. � 3

  4. Decomposi5on and Mul5cut d ∈ R E y ∈ { 0 , 1 } E • Minimum Cost Multicut Problem [Groetschel et al @Mathematical Programming’1989] X min d e y e y ∈ { 0 , 1 } E e ∈ E X subject to ∀ C ∈ cycles( G ) ∀ e ∈ C : (1 − y e ) ≤ (1 − y e 0 ) e 0 ∈ C \{ e } � 4

  5. Overview • Graph Decomposition and Multicut • Multi-person Tracking Minimum cost lifted multicut problem ‣ • Multi-person Pose Estimation End-to-end Learning for Graph Decompostion ‣ � 5

  6. Mul5-person Tracking Input video People detec&on Our tracking result � 6

  7. Mul5-person Tracking • Tracking as a global data associa&on problem • Typically addressed as finding disjoint paths in the graph ‣ Disjoint paths do not merge or branch [Andriluka et al@CVPR’08; Zhang et al @CVPR’08; Shitrit et al @ ICCV’11; Pirsiavash et al@CVPR’11; Kuo et al @CVPR’11; Chen et al @CVPR’14; Wang et al@ECCV’16; Schulter at al @CVPR’17] � 7

  8. Mul5-person Tracking • Tracking as a global data associa&on problem • Typically addressed as finding disjoint paths in the graph ‣ Disjoint paths do not merge or branch ‣ Pre-processing: spa&o-NMS per frame ‣ Post-processing: merge tracks cross frames � 8

  9. Mul5-person Tracking • Tracking as a global data associa&on problem • Typically addressed as finding disjoint paths in the graph ‣ Disjoint paths do not merge or branch ‣ Pre-processing: spa&o-NMS per frame ‣ Post-processing: merge tracks cross frames � 9

  10. Mul5-person Tracking • Graph decomposi&on for mul&-person tracking Within-frame edges Long-range edges • Desired property of “ tracking by graph decomposi5on ” ‣ Joint spa&o-temporal associa&on ‣ Short- and long-range edges complement each other ‣ The number of people is op&mized � 10

  11. Mul5-person Tracking • Graph decomposi&on for mul&-person tracking • Desired property of “tracking by graph decomposi&on” ‣ Joint spa&o-temporal associa&on ‣ Short- and long-range edges complement each other ‣ The number of people is op&mized � 11

  12. The Underlying Graph in Space-Time Domain: Visualizing Disjoint Paths Solu&on Red dots : detec&on hypotheses x Red lines : linking hypotheses Disjoint Paths time y � 12

  13. The Underlying Graph in Space-Time Domain: Visualizing Mul&Cut Solu&on x Decompositions (clusters) time y � 13

  14. Tracking Result by Graph Decomposi5on Detec5ons Tracklets Decomposi5on Final tracks � 14

  15. Minimum Cost Mul5cut Problem oblem . X X min c v x v + d e y e x 2 { 0 , 1 } V v 2 V e 2 E y 2 { 0 , 1 } E Consistency ∀ e = vw ∈ E : y vw ≤ x v ∀ e = vw ∈ E : y vw ≤ x w Transi5vity ∀ C ∈ cycles( G ) ∀ e ∈ C : X (1 − y e ) ≤ (1 − y e 0 ) e 0 2 C \{ e } � 15

  16. Minimum Cost Mul5cut Problem oblem . X X min c v x v + d e y e x 2 { 0 , 1 } V v 2 V e 2 E y 2 { 0 , 1 } E Consistency ∀ e = vw ∈ E : y vw ≤ x v ∀ e = vw ∈ E : y vw ≤ x w Transi5vity ∀ C ∈ cycles( G ) ∀ e ∈ C : X (1 − y e ) ≤ (1 − y e 0 ) e 0 2 C \{ e } • Op&miza&on ‣ ILP solver (Branch-and-Cut) ‣ Heuris&c solver (Kernighan-Lin heuris&c) [Kernighan&Lin@Bell System Technical Journal’1970] � 16

  17. Minimum Cost Mul5cut Problem oblem . X X min c v x v + d e y e x 2 { 0 , 1 } V v 2 V e 2 E y 2 { 0 , 1 } E Consistency ∀ e = vw ∈ E : y vw ≤ x v ∀ e = vw ∈ E : y vw ≤ x w Transi5vity ∀ C ∈ cycles( G ) ∀ e ∈ C : X (1 − y e ) ≤ (1 − y e 0 ) e 0 2 C \{ e } e = �h θ , f ( e ) i • Pairwise feature f ( e ) d e y ‣ Spa&o-temporal rela&on ‣ Local image patch matching (DeepMatching [Weinzaepfel et al @ICCV’13] ) � 17

  18. How do we model the long-term connec5ons? Time � 18

  19. How do we model the long-term connec5ons? • Deep Person Re-iden&fica&on Network StackPose Net Stack Net Siamese Net Accuracy: 84.7% 86.9% 90.0% � 19

  20. How do we model the long-term connec5ons? • Deep Person Re-iden&fica&on Network ‣ Compare with DeepMatching feature ( DM) and Spa&o-temporal rela&on feature ( ST ) 1 0 . 9 0 . 8 Accuracy 0 . 7 ST DM 0 . 6 Re-ID 0 . 5 Comb 0 . 4 0 10 30 50 100 150 200 Temporal distance (frames) � 20

  21. How to cope with and expose the uncertainty? x x x x x x x -3 -3 3 3 -3 -3 3 3 LiWed edges -1 v 4 -1 v 4 1 1 v 3 v 3 -3 3 -3 3 v 2 v 2 v 1 v 1 Mul5cut LiWed Mul5cut • Minimum Cost LiWed Mul5cut X min d e y e y 2 { 0 , 1 } E 0 X e 2 E 0 2 Transi5vity X o ∀ C ∈ cycles( G ) ∀ e ∈ C : (1 − y e ) ≤ (1 − y e 0 ) e 0 2 C \{ e } ∀ vw ∈ E 0 \ E ∀ P ∈ vw -paths( G ) : (1 − y vw ) ≤ X (1 − y e ) e 2 P ∀ vw ∈ E 0 \ E ∀ C ∈ vw -cuts( G ) : y vw ≤ X y e e 2 C � 21

  22. Results on the MOT Benchmark CVPR 17 CVPR 17 arXiv’ 16 ECCVW' 16 � 22

  23. Tracking Result Liied Mul&cut (Ours) SenseTime � 23

  24. Overview • Graph Decomposition and Multicut • Multi-person Tracking Minimum cost lifted multicut problem [CVPR15 CVPR17] ‣ • Multi-person Pose Estimation End-to-end Learning for Graph Decomposition ‣ � 24

  25. Mul5-person Pose Es5ma5on • Rapid progresses in the recent two years ‣ Open pose from CMU/DeepCut from MPII 
 � 25

  26. DeepCut recap • A node labeling of a graph • A mul5cut of a graph X X X min β dd 0 y dd 0 min α dc x dc x y c ∈ C d ∈ D dd 0 ∈ E node variable edge variable • A joint mul5cut and node labeling problem � 26

  27. DeepCut recap body joint labels pair of detec5ons Consistency Transi5vity Uniqueness � 27

  28. Graph Decomposi5on Problems • Mul& person tracking [Tang et al. CVPR 15, CVPR17] • Detec&on Non-maximum suppression [Tang et al. CVPR 15] Patch-based CNN ● KITTI dataset, 375 x 1242 • Mul& person pose es&ma&on Extract patches of different sizes: 270 x 432, 180 x 288, and 120 x 192 ● [Insafutdinov et al. CVPR17] Run the extracted patches to obtain local instance predictions ● ● There are less number of instances in the patch, so easier for CNN to assign instance labels. The instance ID is not guaranteed to be consistent across different patches. ● • Instance Segmenta&on [Kirillov et al. CVPR17] (Image from Zhang et al. 2015) � 28

  29. An end-to-end learning approach? • How to jointly op&mize the model parameters and the weights of the front end CNNs? • How to u&lize the cycle consistency constraints as supervisory signals? 
 � 29

  30. A binary cubic program • The minimum cost multicut problem X min c e x e x ∈ { 0 , 1 } E e ∈ E X ∀ C ∈ cc ( G ) ∀ e ∈ C : x e 0 . subject to x e ≤ e 0 ∈ C \{ e } • It can be equivalently stated as an unconstrained binary multilinear problem with a large enough C X X X Y min c e x e + C (1 − x e 0 ) . x e x ∈ { 0 , 1 } E e ∈ E e ∈ C C ∈ cc ( G ) e 0 ∈ C \{ e } • In the special case where the graph is complete, the above problem is specialised to a binary cubic problem X X min c e x e + C ( x uv ¯ x vw ¯ x uw + ¯ x uv x vw ¯ x uw + ¯ x uv ¯ x vw x uw ) . x ∈ { 0 , 1 } E e ∈ E � V � { u,v,w } ∈ 3 � 30

  31. End-to-End Learning for Mul5cut • The new binary cubic problem. X X min c e x e + C ( x uv ¯ x vw ¯ x uw + ¯ x uv x vw ¯ x uw + ¯ x uv ¯ x vw x uw ) . x ∈ { 0 , 1 } E e ∈ E � V � { u,v,w } ∈ 3 • The corresponding CRF formulation. X X ψ U ψ Cycle E ( x ) = i ( x i ) + (x c ) c c i • Pattern-based Potential [Vineet et. al. @eccv’12] ( if x c ∈ P c γ x c ψ pat ( x c ) = c otherwise γ max � 31

  32. End-to-End Learning for Mul5cut • A better feature map. with CRF inference without CRF inference � 32

  33. End-to-End Learning for Mul5cut • A better pose estimation. without CRF inference with CRF inference � 33

Recommend


More recommend