Large-Scale Video Retrieval Using Image Queries André Filgueiras de Araujo Department of Electrical Engineering Stanford University Andre Araujo – Large-Scale Video Retrieval Using Image Queries 1
The “Dark Matter” of the Digital Age 400+ hours of video uploaded per minute 8+ billion video views per day 85% of data in the 100+ hours of video form of multimedia uploaded per minute Key problem: How can we make sense of these data? Andre Araujo – Large-Scale Video Retrieval Using Image Queries 2
Automatic Visual Recognition Image classification • Is this an urban landscape? Object detection • Does this image contain a bus? Where? Instance recognition (a.k.a. “visual search”) • Does this image contain the “Wicked” billboard? Andre Araujo – Large-Scale Video Retrieval Using Image Queries 3
Visual Search Image query Database of images Retrieval ¡ System ¡ Product recognition Location recognition Commercial applications [Tsai et al., MM’08, MM’10] [Chen et al., CVPR’11] Andre Araujo – Large-Scale Video Retrieval Using Image Queries 4
Video Retrieval Using Image Queries Image query Database of video clips Retrieval ¡ System ¡ Applications: • Brand monitoring: search YouTube using product images • News videos: search event footage using photos • Online education: search lectures using slides Andre Araujo – Large-Scale Video Retrieval Using Image Queries 5
Online Prototype http://videosearch.stanford.edu Andre Araujo – Large-Scale Video Retrieval Using Image Queries 6
Simple Architecture Frame short-list Query descriptor Query-to- 1 frames Too many frames 2 à does not scale 3 Frame index Query image Final result Geometric Feature verification matching 1 2 Feature index Andre Araujo – Large-Scale Video Retrieval Using Image Queries 7
Large-Scale Architecture Focus of this work Clip short-list Frame short-list Query descriptor Query-to- Query-to- 1 1 clips frames 2 2 3 3 Clip Frame index index Query image Final result Geometric Feature verification matching 1 2 Feature index Andre Araujo – Large-Scale Video Retrieval Using Image Queries 8
Video Retrieval Using Image Queries Clip short-list Query descriptor Query-to- 1 clips 2 3 Clip index Main challenges: • Asymmetry: how can we compare images to videos? • Temporal aggregation: how can we describe a video clip for query-by-image retrieval? Andre Araujo – Large-Scale Video Retrieval Using Image Queries 9
Contributions • Asymmetric comparisons for Fisher vectors Fisher Vector Comparisons • Cluttered query or database images • Fisher vector descriptors for video segments Fisher Vector Aggregation • Compact database for large-scale retrieval • Bloom filter descriptors for video segments Bloom Filter Aggregation • Fast and accurate large-scale retrieval Andre Araujo – Large-Scale Video Retrieval Using Image Queries 10
Related Work: Visual Search Query Augmented Reality Content Tracking Video TCD [Makar et al., ’12] Frame Mat. + ST [Douze et al., ’10] Hybrid Vis. Search [Chen et al., ’14] TRECVID-CCD [Over et al., ’12] Traditional Visual Search Video Retrieval by Image FV [Perronnin et al., ’07] Image Discussed on next slide BoW [Sivic et al., ’03] SIFT [Lowe, ’04] Database Images Videos Andre Araujo – Large-Scale Video Retrieval Using Image Queries 11
Related Work: Video Retrieval Using Images • Early work – BoW retrieval of movie frames [Sivic and Zisserman, ICCV’03] – Object-level retrieval of movie shots [Sivic et al., ECCV’04] • TRECVID Instance Search Challenge [Over et al., TRECVID’10-15] – Frame-based BoW with Color SIFT [Le et al., ’10-11] – Shot-based aggregation using BoW [Zhu et al., ’13] [Ballas et al., ’14] – BoW query-adaptive asymmetrical dissimilarities [Zhu et al., ’13] • Object localization in videos – SURF-based matching per shot [Apostolidis et al., ICME’13] – Optimal path using dynamic programming [Meng et al. ICIP’15] Andre Araujo – Large-Scale Video Retrieval Using Image Queries 12
Background: Pairwise Image Matching Query image Database image Image features Descriptor 1 Descriptor 2 … Descriptor n Interest Local Descriptor Point Descriptor Matching Detection Extraction Andre Araujo – Large-Scale Video Retrieval Using Image Queries 13
Background: Fisher Vector (FV) [Perronnin and Dance, CVPR’07] • State-of-the-art technique for large-scale retrieval • Key property: represent a set of local descriptors by a compact fixed-length vector à Two images can be compared by comparing their Fisher vectors • Construction: describe an image with aggregated Fisher scores of its local descriptors – Local descriptor distribution: Gaussian Mixture Model (GMM) – Usually only Gaussian means are taken into account • Extension of Bag-of-Words technique [Sivic and Zisserman, ICCV’03] Andre Araujo – Large-Scale Video Retrieval Using Image Queries 14
Background: Fisher Vector (FV) [Perronnin and Dance, CVPR’07] Descriptor space Query image Database image 1 Database image 2 Query FV -0.2 0.2 -0.3 -0.3 -0.3 0.8 DB Im. 1 FV -0.3 0.3 0.3 -0.6 -0.3 0.3 DB Im. 2 FV 0.5 -0.2 -0.7 0.1 -0.6 0 … … Andre Araujo – Large-Scale Video Retrieval Using Image Queries 15
Background: Binarized Fisher Vector (FV*) [Perronnin et al., CVPR’10] Descriptor space Query image Database image 1 Database image 2 Query FV* 0 0 0 0 0 1 DB Im. 1 FV* 0 1 1 0 0 1 DB Im. 2 FV* 1 0 0 1 0 0 … … Andre Araujo – Large-Scale Video Retrieval Using Image Queries 16
Contribution 1 • Asymmetric comparisons for Fisher vectors Fisher Vector Comparisons • Cluttered query or database images • Fisher vector descriptors for video segments Fisher Vector Aggregation • Compact database for large-scale retrieval • Bloom filter descriptors for video segments Bloom Filter Aggregation • Fast and accurate large-scale retrieval Andre Araujo – Large-Scale Video Retrieval Using Image Queries 17
Asymmetric Image Comparison Query image Database image Object retrieval application Video bookmarking application How can we incorporate asymmetry in FV comparisons? Andre Araujo – Large-Scale Video Retrieval Using Image Queries 18
Asymmetric Comparison for FV Fisher vector = [ v 1 , v 2 , … , v K ] … Regions and have different statistics à features from are usually not present in Andre Araujo – Large-Scale Video Retrieval Using Image Queries 19
Asymmetric Comparison for FV z m • FV comparison metric: cosine similarity • We want: θ 1 < θ 2 θ 1 y θ 2 • Common failure case: n x m' θ 1 > θ 2 but θ 1 ’ < θ 2 θ 1 ’ q q query • Insight: m correct match in database Compare query and database based on their projections to the x-y plane n incorrect match in database θ 1 = angle( q , m ) (i.e., using only Gaussians visited by query) θ 2 = angle( q , n ) θ 1 ’ = angle( q , m’ ) Andre Araujo – Large-Scale Video Retrieval Using Image Queries 20
Asymmetric Comparison for FV Descriptor space Image Gaussian not visited by this image Original FV 0.7 0.2 -0.5 0.2 -0.2 0.2 Re-norm. Zero Modified FV 0.8 0.3 -0.5 0.3 0 0 Andre Araujo – Large-Scale Video Retrieval Using Image Queries 21
Asymmetric Comparison for FV • Two retrieval problems Query Database – Query contained in database All database images compared to query based on the same subspace Query image defines projection – Database contained in query Query Database Problem: each database image is compared to the query based on different subspaces Solution: introduce weight to favor database images with more visited Database image Gaussians defines projection Andre Araujo – Large-Scale Video Retrieval Using Image Queries 22
Dataset: Query Contained in Database Query Reference Clutter … + … … 200 … + Distractor … + … 9,800 … + Query Database From 0 to 40 clutter images Andre Araujo – Large-Scale Video Retrieval Using Image Queries 23
Dataset: Database Contained in Query Query Clutter Reference … + … … 200 … + Distractor From 0 to 40 clutter images … 9,800 Query Database Andre Araujo – Large-Scale Video Retrieval Using Image Queries 24
Experiments: Asymmetric FV Comparisons Query contained in database Database contained in query 2048 Gaussians 2048 Gaussians 90 90 80 80 70 70 60 mAP (%) mAP (%) 60 50 50 25 % 40 40 25 % 30 FV Asym. FV Asym. 30 FV ⋆ Asym. FV ⋆ Asym. 20 FV Baseline FV Baseline 20 FV ⋆ Baseline FV ⋆ Baseline 10 10 10 0 10 1 10 0 10 1 Number of clutter images Number of clutter images Andre Araujo – Large-Scale Video Retrieval Using Image Queries 25
Contribution 2 • Asymmetric comparisons for Fisher vectors Fisher Vector Comparisons • Cluttered query or database images • Fisher vector descriptors for video segments Fisher Vector Aggregation • Compact database for large-scale retrieval • Bloom filter descriptors for video segments Bloom Filter Aggregation • Fast and accurate large-scale retrieval Andre Araujo – Large-Scale Video Retrieval Using Image Queries 26
Recommend
More recommend