scene semantics from long term observation of people
play

Scene semantics from long-term observation of people. Jacob Menashe - PowerPoint PPT Presentation

Scene semantics from long-term observation of people. Jacob Menashe October 5, 2012 Introduction Function over form. Introduction Function over form. Form can be unique, function can be descriptive. Introduction Function over


  1. Inferring Probable Pose ◮ Objective: Choose a likely pose for a given area. ◮ Choose a pose cluster to maximize: J 9 ˆ � � � k = arg max w y i ( k , j , c ) k j = 1 c = 1 pixels i ∈ B k j , c ◮ k is the pose

  2. Inferring Probable Pose ◮ Objective: Choose a likely pose for a given area. ◮ Choose a pose cluster to maximize: J 9 ˆ � � � k = arg max w y i ( k , j , c ) k j = 1 c = 1 pixels i ∈ B k j , c ◮ k is the pose ◮ j is the joint

  3. Inferring Probable Pose ◮ Objective: Choose a likely pose for a given area. ◮ Choose a pose cluster to maximize: J 9 ˆ � � � k = arg max w y i ( k , j , c ) k j = 1 c = 1 pixels i ∈ B k j , c ◮ k is the pose ◮ j is the joint ◮ c is the joint cell

  4. Inferring Probable Pose ◮ Objective: Choose a likely pose for a given area. ◮ Choose a pose cluster to maximize: J 9 ˆ � � � k = arg max w y i ( k , j , c ) k j = 1 c = 1 pixels i ∈ B k j , c ◮ k is the pose ◮ j is the joint ◮ c is the joint cell ◮ B k j , c is the bounding box

  5. Inferring Probable Pose ◮ Objective: Choose a likely pose for a given area. ◮ Choose a pose cluster to maximize: J 9 ˆ � � � k = arg max w y i ( k , j , c ) k j = 1 c = 1 pixels i ∈ B k j , c ◮ k is the pose ◮ j is the joint ◮ c is the joint cell ◮ B k j , c is the bounding box P ( R ) . ◮ w y i ( k , j , c ) is the learned SVM weights for k , j , c in ˜ h

  6. Introduction Background Approach Learning Through Video Candidate Object Detection Learning Object Model Inferring Probable Pose Experiments and Results Discussion and Conclusion

  7. Introduction Background Approach Learning Through Video Experiments and Results Annotated Video Datasets Semantic Labeling Functional Surface Estimation Pose-Region Relationships Pose Prediction Discussion and Conclusion

  8. Annotated Video Datasets ◮ ~150 time-lapse videos of indoor environments

  9. Annotated Video Datasets ◮ ~150 time-lapse videos of indoor environments ◮ Stationary cameras

  10. Annotated Video Datasets ◮ ~150 time-lapse videos of indoor environments ◮ Stationary cameras ◮ Manual annotation of single frames

  11. Annotated Video Datasets ◮ ~150 time-lapse videos of indoor environments ◮ Stationary cameras ◮ Manual annotation of single frames ◮ http://www.youtube.com/watch?v=17HXRdVzsrM

  12. Semantic Labeling Labelings are evaluated with AP score. DPM 1 Alternate 2 (A + L) (P) (A + P) (A + L + P) Wall - 75 76 76 82 81 Ceiling - 47 53 52 69 69 Floor - 59 64 65 76 76 Bed 31 12 14 21 27 26 Sofa/Armchar 26 26 34 32 44 43 Coffee Table 11 11 11 12 17 17 Chair 9.5 6.3 8.3 5.8 11 12 Table 15 18 17 16 22 22 Wardrobe/Cupboard 27 27 28 22 36 36 Christmas Tree 50 55 72 20 76 77 Other Object 12 11 7.9 13 16 16 Average 23 31 35 30 43 43 1 Felzenszwalb et al. [2010] 2 Hedau et al. [2009]

  13. Semantic Labeling Labelings are evaluated with AP score. ◮ Measured against two competing methods. DPM 1 Alternate 2 (A + L) (P) (A + P) (A + L + P) Wall - 75 76 76 82 81 Ceiling - 47 53 52 69 69 Floor - 59 64 65 76 76 Bed 31 12 14 21 27 26 Sofa/Armchar 26 26 34 32 44 43 Coffee Table 11 11 11 12 17 17 Chair 9.5 6.3 8.3 5.8 11 12 Table 15 18 17 16 22 22 Wardrobe/Cupboard 27 27 28 22 36 36 Christmas Tree 50 55 72 20 76 77 Other Object 12 11 7.9 13 16 16 Average 23 31 35 30 43 43 1 Felzenszwalb et al. [2010] 2 Hedau et al. [2009]

  14. Semantic Labeling Labelings are evaluated with AP score. ◮ Measured against two competing methods. ◮ (A+P), (A + L + P) outperform in all cases except for bed detection. DPM 1 Alternate 2 (A + L) (P) (A + P) (A + L + P) Wall - 75 76 76 82 81 Ceiling - 47 53 52 69 69 Floor - 59 64 65 76 76 Bed 31 12 14 21 27 26 Sofa/Armchar 26 26 34 32 44 43 Coffee Table 11 11 11 12 17 17 Chair 9.5 6.3 8.3 5.8 11 12 Table 15 18 17 16 22 22 Wardrobe/Cupboard 27 27 28 22 36 36 Christmas Tree 50 55 72 20 76 77 Other Object 12 11 7.9 13 16 16 Average 23 31 35 30 43 43 1 Felzenszwalb et al. [2010] 2 Hedau et al. [2009]

  15. Semantic Labeling Output Background Ground Truth (A + L + P) (P) (A + L)

  16. Functional Surface Estimation ◮ Measured with AP on functional labels

  17. Functional Surface Estimation ◮ Measured with AP on functional labels ◮ Walkable: 76%

  18. Functional Surface Estimation ◮ Measured with AP on functional labels ◮ Walkable: 76% ◮ Sittable: 25%

  19. Functional Surface Estimation ◮ Measured with AP on functional labels ◮ Walkable: 76% ◮ Sittable: 25% ◮ Reachable: 44%

  20. Functional Surface Estimation ◮ Measured with AP on functional labels ◮ Walkable: 76% ◮ Sittable: 25% ◮ Reachable: 44% ◮ Average gain of 13% above baseline competitor: Fouhey et al. [2012]

  21. Pose-Region Relationships

  22. Pose-Region Relationships

  23. Pose-Region Relationships

  24. Pose-Region Relationships

  25. Pose-Region Relationships

  26. Pose-Region Relationships

  27. Pose Prediction

  28. Pose Prediction

  29. Pose Prediction

  30. Pose Prediction

  31. Pose Prediction

  32. Pose Prediction

  33. Pose Prediction

  34. Pose Prediction

  35. Introduction Background Approach Learning Through Video Experiments and Results Discussion and Conclusion Extensions Criticisms Conclusion

  36. Extensions ◮ Using semantics as probabilistic information

  37. Extensions ◮ Using semantics as probabilistic information ◮ Learning new objects from observation

  38. Criticisms

Recommend


More recommend