Probabilistic Tracking and Probabilistic Tracking and Probabilistic Tracking and Thesis supervisors Thesis supervisors Reconstruction of 3D Human Motion Reconstruction of 3D Human Motion Reconstruction of 3D Human Motion • Prof. Jan-Olof Eklundh, KTH in Monocular Video Sequences in Monocular Video Sequences in Monocular Video Sequences • Prof. Michael Black, Brown University Presentation of the thesis work of: Collaborators Collaborators Hedvig Sidenbladh, KTH • Dr. David Fleet, Xerox PARC • Prof. Dirk Ormoneit, Stanford University Thesis opponent: Prof. Bill Freeman, MIT A vision of the future from the past. Applications of computers Applications of computers Elektro looking at people looking at people • Human-machine interaction – Robots – Intelligent rooms Sparky • Video search • Entertainment: motion capture for games, animation, and film. • Surveillance New York Worlds Fair, 1939 (Westinghouse Historical Collection) Why is it Hard? Why is it Hard? Technical Goal Technical Goal Technical Goal The appearance of people can vary dramatically. Tracking a human in 3D 1
Why is it hard? Why is it hard? Why is it hard? Why is it hard? Geometrically under-constrained. People can appear in arbitrary poses. Structure is unobservable— inference from visible parts. State of the Art. One solution: One solution: One solution: Bregler Bregler and and Malik Malik ‘ ‘98 98 • Use markers • Brightness constancy cue • Use multiple cameras – Insensitive to appearance • Full-body required multiple cameras • Single hypothesis http://www.vicon.com/animation/ State of the Art. 2D vs. 3D tracking 2D vs. 3D tracking 2D vs. 3D tracking Cham and and Rehg Rehg ‘ ‘99 99 Cham • Artist • Artist’ ’s models... s models... • Single camera, multiple hypotheses • 2D templates (no drift but view dependent) I( x , t ) = I( x + u , 0) + η 2
State of the Art. 1999 state of art 1999 state of art 1999 state of art Deutscher, North, , North, Deutscher Bascle, & Blake Bascle , & Blake ‘ ‘00 00 • Multiple hypotheses • Multiple cameras • Simplified clothing, lighting and background Pavlovic, Rehg, Cham, and Murphy, Intl. Conf. Computer Vision, 1999 Note: we can Game videos... Game videos... Game videos... fake it with clever system design M. Krueger, “Artificial Reality”, Addison-Wesley, 1983. Performance specifications Decathlete 100m hurdles Performance specifications Decathlete 100m hurdles Decathlete 100m hurdles Black background No other people in camera * No special clothing * Monocular, grayscale, sequences (archival data) * Unknown, cluttered, environment Task: Infer 3D human motion from 2D image Person at known Display tells person what motion to do. distance and position. 3
Bayesian formulation Bayesian formulation System components System components System components • Representation for probabilistic p (model | cues) = p (cues | model) p (model) analysis. p (cues) • Models for human appearance 1. Need a constraining likelihood model that is also (likelihood term). invariant to variations in human appearance. • Models for human motion (prior term). 2. Need a prior model of how people move. – Very general model 3. Posterior probability : Need an effective way to – Very specific model explore the model space (very high – Example-based model dimensional) and represent ambiguities. System components System components System components Simple Body Model Simple Body Model • Representation for probabilistic analysis. • Models for human appearance (likelihood term). • Models for human motion (prior term). – Very general model – Very specific model – Example-based model * Limbs are truncated cones * Parameter vector of joint angles and angular velocities = φ Multiple Hypotheses Particle Filter Multiple Hypotheses Particle Filter Posterior Temporal dynamics r φ φ φ p ( | I ) p ( | ) • Posterior distribution over − − − t t 1 t 1 t 1 sample sample model parameters often multi- sample sample modal (due to ambiguities) r normalize normalize φ φ p ( I | ) • Represent whole distribution: p ( | I ) t t t t Posterior Likelihood – sampled representation – each sample is a pose Problem: Expensive represententation of posterior! Approaches to solve problem: – predict over time using a particle • Lower the number of samples. (Deutsher et al., CVPR00) filtering approach • Represent the space in other ways (Choo and Fleet, ICCV01) 4
What do people look like? Changing background Varying shadows System components System components System components • Representation for probabilistic analysis. • Models for human appearance Occlusion (likelihood term). • Models for human motion (prior term). – Very general model – Very specific model – Example-based model Deforming clothing Low contrast limb boundaries What do non-people look like? Edge Detection? Edge Detection? Key Idea #1 (Likelihood) Key Idea #1 (Likelihood) 1. Use the 3D model to predict the location of limb boundaries (not necessarily features) in the scene. 2. Compute various filter responses steered to the predicted orientation of the limb. 3. Compute likelihood of filter responses using a statistical model learned from examples . • Probabilistic model? • Under/over-segmentation, thresholds, … Example Training Images Example Training Images Edge Filters Edge Filters Normalized derivatives of Gaussians (Lindeberg, Granlund and Knutsson, Perona, Freeman&Adelson, …) Edge filter response steered to limb orientation: θ σ = θ σ + θ σ e f ( , , ) sin f ( , ) cos f ( , ) x x x x y Filter responses steered to arm orientation. 5
Edge Distributions Edge Likelihood Ratio Edge Distributions Edge Likelihood Ratio Edge response steered to model edge: θ σ = θ σ + θ σ f ( , , ) sin f ( , ) cos f ( , ) x x x e x y Edge response Likelihood ratio Similar to Konishi et al., CVPR 99 Other Cues Other Cues Ridge Distributions Ridge Distributions Ridges Ridge response steered to limb orientation θ σ = 2 θ σ + 2 θ σ − θ θ σ − f ( x , , ) | sin f ( x , ) cos f ( x , ) 2 sin cos f ( x , ) | r xx yy xy θ σ + θ σ + θ θ σ 2 2 | cos f ( x , ) sin f ( x , ) 2 sin cos f ( x , ) | xx yy xy I( x , t ) Motion I( x + u , t +1) Ridge response only on certain image scales! Motion distributions Likelihood Formulation Motion distributions Likelihood Formulation • Independence assumptions: – Cues: p(image | model) = p(cue1 | model) p(cue2 | model) – Spatial: p(image | model) = Π p(image(x) | model) x ∈ image – Scales: p(image | model) = Π p(image( σ ) | model) σ =1,... • Combines cues and scales! • Simplification, in reality there are Different underlying motion models dependencies 6
The power of cue combination Using edge cues alone The power of cue combination The power of cue combination Using edge cues alone Using edge cues alone Edge cues Using flow cue alone Using ridge cues alone Using ridge cues alone Using ridge cues alone Using flow cue alone Using flow cue alone Flow cues Ridge cues Using edge, ridge, and motion cues Using edge, ridge, and motion cues Using edge, ridge, and motion cues together together together Edge cues Key Idea #2 Key Idea #2 p (image | foreground, background) ∝ p (foreground part of image | foreground) Ridge cues p (foreground part of image | background) Flow cues Do not look in parts of the image considered background Foreground part of image 7
Likelihood System components Likelihood System components System components ∏ ∏ = p ( image | fore , back ) p ( image | fore ) p ( image | back ) • Representation for probabilistic fore pixels back pixels ∏ ∏ analysis. p ( image | back ) p ( image | fore ) = all pixels fore pixels ∏ • Models for human appearance p ( image | back ) (likelihood term). fore pixels ∏ • Models for human motion (prior term). const p ( image | fore ) = fore pixels ∏ – Very general model p ( image | back ) fore pixels – Very specific model – Example-based model Foreground pixels Background pixels The Prior term The Prior term Very general model Very general model Very general model Bayesian formulation: • Constant velocity motions • Not constrained by how people tend to ∝ p (model | cue) p (cue | model) p (model) move. – Need a constraining likelihood model that is also invariant to variations in human appearance – Need a good model of how people move Constant velocity model Constant velocity model Tracking an Arm Tracking an Arm • All DOF in the model parameter space, φ , independent • Angles are assumed to change with constant speed • Speed and position changes are randomly sampled from normal distribution 1500 samples ~2 min/frame Moving camera, constant velocity model 8
Recommend
More recommend