cs 4495 computer vision activity recognition
play

CS 4495 Computer Vision Activity Recognition Aaron Bobick School - PowerPoint PPT Presentation

Activity Recognition 1 CS 4495 Computer Vision A. Bobick CS 4495 Computer Vision Activity Recognition Aaron Bobick School of Interactive Computing Activity Recognition 1 CS 4495 Computer Vision A. Bobick Administrivia PS6


  1. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick CS 4495 Computer Vision Activity Recognition Aaron Bobick School of Interactive Computing

  2. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Administrivia • PS6 – should be working on it! Due Sunday Nov 24 th . • Exam: Tues November 26 th . • Short answer and multiple choice (mostly short answer) • Study guide is posted in calendar. • PS7 – we hope to have out by 11/26. Will be straight forward implementation of Motion History Images

  3. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Video • A video is a sequence of frames captured over time • Now our image data is a function of space (x, y) and time (t)

  4. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Video as an “Image Stack” 255 time 0 t • Can look at video data as a spatio-temporal volume • If camera is stationary, each line through time corresponds to a single ray in space Alyosha Efros, CMU

  5. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Aside: Epipolar Plane (“EPI”) images

  6. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Aside: Epipolar Plane (“EPI”) images

  7. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick EPI images and activity

  8. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick EPI images and activity

  9. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Processing video: object detection • If the goal of “activity recognition” is to recognize the activity of the objects… • … you (may) have to find the objects….

  10. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Background subtraction Slide credit: Birgi Tamersoy

  11. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Background subtraction • Simple techniques can do ok with static camera • …But hard to do perfectly • Widely used: • Traffic monitoring (counting vehicles, detecting & tracking vehicles, pedestrians), • Human action recognition (run, walk, jump, squat), • Human-computer interaction • Object tracking

  12. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Simple approach: background subtraction Slide credit: Birgi Tamersoy

  13. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Frame differencing Slide credit: Birgi Tamersoy

  14. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Frame differencing Slide credit: Birgi Tamersoy

  15. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Mean filtering Slide credit: Birgi Tamersoy

  16. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Frame differences vs. background subtraction • Toyama et al. 1999

  17. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Median Filtering Slide credit: Birgi Tamersoy

  18. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Average/Median Image Alyosha Efros, CMU

  19. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Background Subtraction - = Alyosha Efros, CMU

  20. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Pros and cons Advantages: • Extremely easy to implement and use! • All pretty fast. • Corresponding background models need not be constant, they change over time. Disadvantages: • Accuracy of frame differencing depends on object speed and frame rate • Median background model: relatively high memory requirements. • Setting global threshold Th… When will this basic approach fail? Slide credit: Birgi Tamersoy

  21. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Background mixture models Idea : model each background pixel with a mixture of Gaussians; update its parameters over time. • Adaptive Background Mixture Models for Real-Time Tracking, Chris Stauer & W.E.L. Grimson

  22. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Background subtraction with depth How can we select foreground pixels based on depth information?

  23. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Human activity in video No universal terminology, but approximately: • “ Event ”: a single instant in time detection. • “ Actions ” or “Movements” : atomic motion patterns -- often gesture-like, single clear-cut trajectory, single nameable behavior (e.g., sit, wave arms) • “ Activity ”: series or composition of actions (e.g., interactions between people) Adapted from Venu Govindaraju and A.Bobick

  24. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Surveillance http://users.isr.ist.utl.pt/~etienne/mypubs/Auvinetal06PETS.pdf

  25. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Human activity in video: basic approaches • Model-based action recognition: • Use human body tracking and pose estimation techniques, relate to action descriptions (or learn) • Major challenge: accurate tracks in spite of occlusion, ambiguity, low resolution • Model-based activity recognition: • Given some lower level detection of actions (or events) recognize the activity by comparing to some structural representation of the activity • Needs to handle uncertainty. • Activity as motion, space-time appearance patterns • Describe overall patterns, but no explicit body tracking • Typically learn a classifier • Recently: “Activity-recognition” from static image • Imagine a picture of a person holding a flute. What are they doing?

  26. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Motion and perceptual organization • Even “impoverished” motion data can evoke a strong percept

  27. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Motion and perceptual organization • Even “impoverished” motion data can evoke a strong percept

  28. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Example • Even “impoverished” motion data can evoke a strong percept Video from Davis & Bobick

  29. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Motion energy images • Spatial accumulation of motion. • Collapse over specific time window. • Motion measurement method not critical (e.g. motion differencing). Time

  30. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Motion history images • Motion history images are a different Moved function of temporal volume. t-15 • Pixel operator is replacement decay: if moving I τ (x,y,t) = τ otherwise I τ (x,y,t) = max( I τ (x,y,t-1)-1 ,0) • Trivial to construct I τ− k (x,y,t) from I τ (x,y,t) so can process multiple time Moved window lengths without more search. t-1 • MEI is thresholded MHI

  31. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Temporal-templates • MEI+ MHI = Temporal template motion energy motion history image image

  32. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Aerobics examples

  33. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Motion Energy Images Davis & Bobick 1999: The Representation and Recognition of Action Using Temporal Templates

  34. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick How to recognize these images? • These are gray scale blob like images. • 100 years of computer vision for recognizing gray blobs (for small values of a hundred). • Old style computer vision: compute some summarization statistics of the pattern 1. construct generative model 2. recognize based upon those statistics. 3.

  35. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Image moments Moments summarize a shape given image I(x,y) = ∑∑ i j ( , ) M x y I x y ij x y Central moments are translation invariant: ∑∑ µ = − − p q ( ) ( ) ( , ) x x y y I x y pq x y M M = = 10 01 x y M M 00 00

  36. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Hu moments • Set of 7 moments • Apply to Motion History Image for global space-time “shape” descriptor • Translation and rotation and scale invariant [ , , , , , , ] h h h h h h h 1 2 3 4 5 6 7

  37. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Hu moments = h 1 = h 2 = h 3 = h 4 = h 5 = h 6

  38. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick = h 7

  39. Activity Recognition 1 CS 4495 Computer Vision – A. Bobick Build a classifier • Generative or Discriminative? • Generative – builds model of each class; compare all • Discriminative – builds model of the boundary between classes • How would you build decent generative models of each class of action? • Use a Gaussian in Hu-moment feature space • Compare likelihoods p(data | model of action i) • If have priors, use them by Bayes rule ∝ (model | data) p(data | model ) p(model ) p i i i • Otherwise just use likelihood. • Or use NN? (Problem Set!) • More on classification on Dec 3

Recommend


More recommend