Institute for Human-Machine Communication Munich University of Technology Face Tracking Tracking and Person and Person Face Action Recognition Recognition Action Martin Zobl, Frank Wallhoff M4 meeting@Delft 25-26.06.2003
Overview • Recapitulation of methodology for action recognition • Face tracking with particle filters • Head orientation estimation • Action segmentation with the Bayesian Information Criterion • Recognition performance comparison on actions from the PETS-ICVS 2003 and the m4 dataset • Outlook M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 2/18
Person Action Recognition Background Subtraction Background Subtraction Background Subtraction Extraction of person locations Face detection/tracking Feature calculation Global Motion Features Temporal segmentation Bayesian Information Criterion Classification of segments Hidden Markov Models Actions, timestamps M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 3/18
Computation of Global Motion Features • Feature extraction based on difference images I d • Actions are represented by global motions in the hot-spot: ( ) ( ) ( ) ( ) ( ) � � � I I I ⋅ ⋅ − x y x y t x y t x y m t x y t , , , , , , , , x y d d , d ( ) ( ) ( ) ( ) ( ) ( ) x y ∈ R x y ∈ R ∈ x y R = σ = = m t , t , , ' i t i i i ( ) ( ) � � � x y x y I I , , x y t x y t , , , , 1 d d ( ) ( ) ( ) ∈ ∈ ∈ x y R x y R x y R , , , i i i Center of motion Variance of motion Intensity of motion • Person location normalized center of motion: ( ) ( ) ( ) = − m t m t p t ' x y , x y x y , , • Derivations of the center of motion: ( ) ( ) ( ) ∆ = − − m t m t m t 1 x y x y x y , , , • Composition of a 7-dimensional feature vector: T � � = � σ σ x m m m m i � � , , , , , , � x y x y x y M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 4/18
Visualized Motion Features Actual Background- Difference- Image Difference I db Image I d Center of person p x,y (t) Center of motion m‘ x,y (t) Derivation � m x,y (t) M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 5/18
Face Tracking Markov state-space model x • Hidden system state t y • Observation t prediction prediction • Recursive filtering distribution � ∝ p x y y p y x p x x p x y y dx ( | ,..., ) ( | ) ( | ) ( | ,..., ) t t t t t t − t − t t − 1 1 1 1 1 x − t 1 Particle Filter likelihood dynamic model prior distribution update update i i π = x ( ) ( ) i N • N weighted particles {( , ), 1,..., } t t N � i i = π δ − p x y y ( ) x x ( ) ˆ ( | ,..., ) ( ) • Sampling the filtering distribution N t t t t t 1 = i 1 i i i π = π − ( ) ( ) p y x ( ) ( | ) • Updating using their likelihood t t t t 1 • Resampling to avoid degradation of particles M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 6/18
Face Tracking (2) i i i i i = ∆ ∆ x ( ) T ( ) T ( ) s ( ) s ( ) ( , , , ) • Particle i − − − − − t t t t t 1 1 1 1 1 i i = + x ( ) Ax ( ) Bw • Prediction with linear autoregressive model − t t t 1 Model trained with ADALINE • Observations i - Skin color ratio ( ) scr i x = p y x ( ) s N ( | ) / t t t t sc - Face likelihood MLP correction equalization sample MLP i p y x ( ) ( | ) i ( ) x t t t classification preprocessing M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 7/18
Face Tracking (3) • Automatic initialization by pyramid sampling and MLP classification • Particle Filtering M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 8/18
Head Orientation Estimation p (i) (face) MLP 1 ϕ (left)=180° ϕ (left)=180° p (i) (left) Particle i ϕ (half left)=135° ϕ (half left)=135° p (i) (half left) MLP 2 ϕ (quarter left)=115° ϕ (quarter left)=115° p (i) (quarter left) ϕ (frontal)=90° ϕ (frontal)=90° p (i) (frontal) ϕ (quarter right)=65° ϕ (quarter right)=65° p (i) (quarter right) ϕ (half right)=45° ϕ (half right)=45° p (i) (half right) ϕ (right)=0° ϕ (right)=0° MLP 8 p (i) (right) N = � ϕ i ϕ i p HO HO ( ) ( ) arg max[ ( )] ( ) i ( ) HO = i 1 Training data: feret + mugshot database M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 9/18
Head Orientation Estimation M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 10/18
Action segmentation with BIC • Already successfully applied for speech segmentation, speaker turn detection and other clustering applications • Split window at position i and compute the � BIC i value for this position: ( ) − + n i n i � d d � 1 1 ∆ = − Σ + Σ + Σ + λ + BIC d n � � log log log log i w f s � � 2 2 2 2 2 • Segment boundary at the most negative value of all � BIC i • d =dimension of vectors, � w,f,s = covariance matrices of entire window, the first and the second segment, � is a penalty weigt M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 11/18
Application of Automatic Stream Segmentation n =15 n =15 � =0.9 � =1.1 • BIC- Segmentation based on feature vectors n =15 n =20 � =6.5 � =6.5 • BIC- Segmentation based on energy vectors M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 12/18
Action Segmentation with BIC n =15, � =6.5 M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 13/18
Recognition Performance PETS Artificial training data, HMMs (5 states, 2 mixtures) Raising Shaking Sit down Get up Nodding Score hand head 50% Sit down 33% 17% 0% 0% 50% 83% Get up 17% 0% 0% 0% 83% Raising 63% 21% 4% 0% 15% 63% hand 42% Nodding 0% 0% 0% 58% 42% Shaking 92% 0% 0% 0% 8% 92% head Overall 66% M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 14/18
Performance Discussion PETS • Classification results in an acceptable recognition performance, considering: – The limited amount of available training examples – Large variations between artificial training and test material, as for example size and view direction M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 15/18
Recognition Performance m4 m4 training data (TRN 01-30) m4 test data (TST 01-30), HMMs (9 states, 3 mixtures) Shaking Sit down Stand up Nodding Writing Pointing Score head 9 Sit down 0 0 0 0 1 90% 12 Stand up 1 0 0 0 1 86% 225 Nodding 1 3 48 5 8 78% Shaking 18 0 0 30 4 1 42% head 471 Writing 0 0 32 22 25 86% 69 Pointing 0 0 3 0 0 96% Overall 82% M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 16/18
Performance Discussion m4 – Improved recognition performance due to real training data – Dramatically varying action lengths – Singular action region initialization not sufficient M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 17/18
Outlook • Head orientation tracking • Improving featurestream by smoothing with action- specialized Kalman-Filters • Action detection on m4 data • Connection to Meeting Segmentation / Multimodal Recognizer M4 meeting@Delft Martin Zobl Institute for Human-Machine Communication 25-26.09.2003 Munich University of Technology 18/18
Institute for Human-Machine Communication Munich University of Technology Face Tracking Tracking and Person and Person Face Action Recognition Recognition Action Martin Zobl M4 meeting@Delft 25-26.06.2003
Recommend
More recommend