Describing Changes in Human Appearance Over Time Video Analysis for Sociology Video Analysis for Sociology Charlie ʼ s Angels: 1976 and Miami Vice: 1984 and 2006 Dukes of Hazzard: Neva Cherniavsky, Ivan Laptev, 2000 1979 and 2005 Josef Sivic, Andrew Zisserman Describing Changes in Human Sociology Research Appearance Over Time • Typical data sets: 250 movies • Coders (usually students) view video in entirety twice and view each incidence multiple times; usually 10% overlap for inter-coder reliability Preventative Medicine Vol 34, 2002 Sociology Research Sociology Research • Typical data sets: 250 movies, 617 commercials • Typical data sets: 250 movies, 617 commercials, 195 television episodes • Coders (usually students) view video in entirety twice • Coders (usually students) view video in entirety twice and view each incidence multiple times; usually 10% and view each incidence multiple times; usually 10% overlap for inter-coder reliability overlap for inter-coder reliability Sex Roles Vol 35 Nos 3/4, 1996 Journal of Alcohol Studies Vol 51 No 5, 1990
Sociology Research Sociology Research • Typical data sets: 250 movies, 617 commercials, 195 • Typical data sets: 250 movies, 617 commercials, 195 television episodes, 900 movies television episodes, 900 movies • Coders (usually students) view video in entirety twice • Raters (usually students) view video in entirety twice and and view each incidence multiple times; usually 10% view each incidence multiple times; usually 10% overlap overlap for inter-coder reliability for inter-rater reliability Tobacco Control Vol 15, 2006 Goal: Video to Statistics Data • Automatically find attributes, and number of occurrences, in video data • Minimize supervision (many different possible attributes) • Hollywood movies from different time periods – The Graduate, Roman Holiday, When Harry Met Sally, Love, Actually • Institut National de l ʼ Audiovisuel – R&D: L. Laborelli and D. Teruggi – 1.5 Mhours of annotated audiovisual archives, 50 years of TV Currently: focus on facial attributes Currently: focus on facial attributes Annotated Training data Annotated Training data Gender: Gender: Males (108): 86.2% Males (108): 86.2% Females (19): 13.8% Females (19): 13.8% Face Pipeline Face Pipeline Facial hair: Facial hair: Mustache (11): 8.0% Mustache (11): 8.0% Detection Detection None (115): 92.0% None (115): 92.0% Expression: Expression: Description Description Smiling (29): 21.0% Smiling (29): 21.0% Tracking Unsmiling (96): 79.0% Tracking Unsmiling (96): 79.0% Hair color: Hair color: Classification Classification Blond (4): 2.9% Blond (4): 2.9% Not blond (124): 97.1% Not blond (124): 97.1% … … … …
Face Pipeline: Detection Face Pipeline: Description • Face representation - local image descriptors at • Run face detection on each frame (Viola- facial feature points Jones) • Extended pictorial structure model [ Everingham , Sivic, Zisserman, 2006] Face Pipeline: Tracking Face Pipeline: Classification • Measure “connectedness” of a pair of faces by point tracks intersecting both • Classify tracks using SVM • Doesn ʼ t require contiguous detections • Distance between tracks is the minimum • Independent evidence – no drift distance between facial features (not a • Faces into tracks kernel): D(T i , T j ) = min(d(x,y) | x ∈ T i ,y ∈ T j ) [Everingham et al. 2006] Classification: Matching face Training data sets Annotated Training data Gender: Males (108): 86.2% Females (19): 13.8% Face Pipeline Facial hair: Mustache (11): 8.0% Detection None (115): 92.0% Expression: Description Smiling (29): 21.0% Tracking Unsmiling (96): 79.0% Hair color: Classification Blond (4): 2.9% Not blond (124): 97.1% … …
Training from still images vs Training data video • Need annotated training data • Still images: + Variation across people • Ideally we would train on a large number + Potentially labeled data from web of attributes with limited supervision for free • Looked at two sources: video or still + Higher quality (resolution, no images motion blur) – Not much variation in expression • Mechanical Turk (Amazon) • Videos: – Large scale coordination of manual tasks + Variation across – Turks label one frame of the track or a single viewpoint/expression still image + Same domain as the testing set – Not much variation in people Current results: gender Automatically tagged video Current work • Preliminary conclusions: Better to train on videos • Ongoing work: Study how to combine still images and videos to improve attribute labeling • More attributes: – Race, age, hair color, eye wear – Use upper body detection to capture clothing, hairstyles – Dynamic attributes: smoking, drinking, smiling • Video to Statistics – Understand where we fail so even when we miss faces, we can report statistics
Recommend
More recommend