A Compact and Discriminative Face Track Descriptor Omkar M Parkhi, - PowerPoint PPT Presentation

A Compact and Discriminative � Face Track Descriptor Omkar M Parkhi, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman

Recognising and verifying faces in videos 2 Recognition Verification same different

VF 2 : a new compact face track descriptor 3 Face track: sequence of face detections in consecutive frames. face track descriptor ▶ Discriminative � ▶ Useful for different tasks (Recognition, Verification) � ▶ Extremely compact

Large scale face retrieval 4 ▶ Example of a typical target dataset � http://www.robots.ox.ac.uk/~vgg/research/on-the-fly/ ▶ 5 years of evening news programs � ▶ 10,000 hrs of broadcast � ▶ 20 Million frames, � ▶ 30 frames per track on average � ▶ Typical 4000D descriptor → 1 TB � ▶ 2.1 Million face tracks � ▶ Our descriptor → 270 MB � ▶ Real time performance

Outline 5 1. Dense feature computation � 2. Fisher Vector encoding � 3. Video and jittered pooling � 4. d 2 Compression by metric learning � W ( x , y ) 5. Binarisation � [011001010] 6. Results

1. Dense feature computation 6 ▶ Input: a face track � ▶ Aligned or unaligned � ▶ No facial landmarks required (eyes, nose, etc.) � ▶ Output: a set of local features � ▶ Extracted from all frames � ▶ Dense RootSIFT at multiple scales � ▶ 64-D PCA

2. Fisher Vector encoding 8 Dense SIFT Hard Assignment x i γ k ( x i ) x i µ k Gaussians � ( μ k , Σ k ) GMM first and second order statistics   v 1 M u 1   1 γ k ( x i ) x i − µ k X   v k = v 2   M √ π k   σ i u 2 FV encoding Φ = i =1     M + sqrt-L 2   ◆ 2   . ✓ x i − µ k 1 .   X normalisation . u k = γ k ( x i ) − 1   M √ 2 π k v K   σ i i =1 u K [Perronnin et al. ECCV 2012]

2. Fisher Vector Encoding 9 Gaussian components as part detectors           x W − 1   2 y H − 1 2 Spatial (x,y) Augmentation

3. Video and jittered pooling 11 ▶ Typically each frame is pooled independently � ▶ Complex inference procedures combining multiple descriptors � ▶ Large memory footprint [Sivic et al. CVPR 09, Everingham et. al IVC 09,, Wolf et al. CVPR 2011]

3. Video and jittered pooling 12 ▶ Single descriptor per track � ▶ Smaller memory footprint � ▶ Easy to use � ▶ Improved performance [Application to Action Recognition: Oneata, Verbeek, Schmid ICCV 2013]

3. Video and jittered pooling 13 ▶ Data augmentation � ▶ Data augmentation without training set increase � ▶ Improvement in the performance [Paulin et al. CVPR 2014]

4. Metric Learning 15 Learn to discriminate faces d 2 W ( x , y ) = k W x � W y k 2 z = W Fisher   x learnt projection Vector v y x u W ( x , y ) = k W x � W y k 2 < b W ( u , v ) = k W u � W v k 2 > b d 2 d 2 same person different people [Simonyan, Parkhi, Vedaldi, Zisserman BMVC 2013]

5. Binarisation 17 Parseval Tight Frame 0 q ⨉ m � 1 Columns 0 = from a ⨉ m q sign 1 q random 0 rotation   1 matrix 0 U z U z sign( U z ) real-valued   q bits only descriptor ▶ Low-dimensional real-valued descriptor → high dimensional binary � ▶ 4x decrease in memory footprint (128D real → 1024D binary) � ▶ Fast distance computation � ▶ Alternative binarisation methods could be used [Jégou et al. ICASSP 2012, Simonyan et al. PAMI 2014]

YouTube Faces Dataset 19 Face Verification same different ▶ Face verification in videos � ▶ 3,425 videos of 1,595 celebrities � ▶ Videos collected from internet � ▶ Wide pose, expression and illumination variation � ▶ 10 splits of 600 pairs of videos � ▶ Restricted setting: Use provided pairs � ▶ Unrestricted setting: Free to form own pairs. [Wolf, Hassner, Moaz CVPR 2011]

YouTube Faces Dataset 20 Face Verification 17.3 Image Pool (Soft assignment FV) 15 Video Pool (Soft assignment FV) 16.2 Video Pool hard asignment fv 14.2 Video Pool + Jittered Pool 13.4 Video Pool. + Binar. 1024 bit + jitt. 12.3 Video Pool. + Joint sim. + jitt. 0 4.5 9 13.5 18 Error

YouTube Faces Dataset 21 Face Verification 21.2 MGBS & SVM- 21.4 APEM FUSION 19.9 STFRD & PMML 20 VSOF & OSS (Adaboost) 18.5 DDML (Combined) 2 13.4 VF 1024D (binary) 2 12.3 VF 256D 8.6 Deep Face (facebook.com) 0 5.5 11 16.5 22 Error Requires additional training data.

Oxford Buffy Dataset 22 Weakly supervised face classification ▶ “Buffy The Vampire Slayer” � ▶ Face tracks from 7 episodes of season 5. � ▶ Both frontal and profile detections � ▶ Weak supervision from transcript and subtitles � ▶ Multi Class classification for every episode [Everingham et al. IVC 2009, Sivic et al. CVPR 2009]

Oxford Buffy Dataset 23 Weakly supervised classification 0.81 Sivic et al. (HOG RBF MKL) 2 0.81 VF ( GMMs trained on Buffy ) 2 0.8 VF ( GMMs trained on YTF ) 2 0.86 VF ( GMMs trained on YTF ) + Jitt. Pool 1024D 2 0.82 VF ( GMMs trained on YTF 2048b) 0.79 0.808 0.825 0.843 0.86 Avg. AP

Recap 24 Very simple yet powerful face track descriptor � ▶ Track descriptor in 128 bytes � ▶ Face landmarks and alignment not required � ▶ One descriptor per track � � ▶ State of the art/comparable results on multiple tasks � ▶ YouTube Faces Dataset � ▶ Oxford Buffy Dataset � � ▶ Can be trained with very small amount of data � ▶ Extremely easy to compute � � ▶ Code online soon. Questions?

A Compact and Discriminative Face Track Descriptor Omkar M Parkhi, - PowerPoint PPT Presentation

A Compact and Discriminative Face Track Descriptor Omkar M Parkhi, Karen Simonyan, Andrea Vedaldi, Andrew Zisserman Recognising and verifying faces in videos 2 Recognition Verification same different VF 2 : a new compact face track

Descriptor Codes with Attributes Descriptor Codes with Attributes Oscar R. Cantu August 2009

Designing descriptors Overview of todays lecture Why do we need feature descriptors?

Face Cover Face Coverings In School Guidelines Face Coverings Face Coverings and PPE Cloth

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

Discriminative Models Joakim Nivre Uppsala University Department of Linguistics and Philology

Fast Fast keypoints keypoints detector and detector and descriptor for view descriptor

TAX/HMRC Issues Joint Independent Audit Committee 26 September 2014 NOT PROTECTIVELY MARKED - No

Deciphering the Face Deciphering the Face Aleix M. Martinez Computational Biology Computational

To provide you with a comprehensive overview on conducting effective face-to face contacts

Finishing Face to Face: The Priesthood Fulfilled in the Book of Revelation Steve Midgley

Status on positron fraction Multi-track event CC fitted Multi-track event 1 track Multi-Track

Compact Subsets Theorem Suppose that K is a subset of a topological space X. 1 If X is compact

Interstate Medical Licensure Compact Overview Define Need for compact Compacts in

Discriminative word alignment by learning the Discriminative word alignment by learning the

Three models for discriminative machine Three models for discriminative machine translation using

Generative vs. discriminative Generative Discriminative Belief network A is more More

CHARTS Culture and Heritage Added value to Regional policies for Tourism Sustainability Good

CVS Group plc Interim results for the 6 months ended 31 December 2015 Simon Innes Chief

Investor Presentation October 2017 nemo2014\Presentations\Analyst Presentation Jan14\201401 Nemo

Facilities Master Plan Process Update C O M M U N I T Y F O R U M S

Student Learning and Student Achievement Recommenda4on 3: ER

Design-Build Trifecta A joint presentation of DBIA WESTERN PACIFIC REGION and AIA LOS

Assessing Impacts of International Volunteer Cooperation Daniel Buckles, Jacques Chevalier and

PPG Update May 21, 2019 Forward-looking statements and other notes The Private Securities

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us