DIGITAL G Institute for Information and Communication Technologies - - PowerPoint PPT Presentation

▶

Feb 22, 2024 371 likes •529 views

DIGITAL G Institute for Information and Communication Technologies JOANNEUM RESEARCH and Vienna JOANNEUM RESEARCH and Vienna University of Technology at INS Task Werner Bailer TRECVID Workshop Nov 2010 TRECVID Workshop, Nov. 2010 O tli

SLIDE 1

DIGITAL G

Institute for Information and Communication Technologies

JOANNEUM RESEARCH and Vienna JOANNEUM RESEARCH and Vienna University of Technology at INS Task

Werner Bailer TRECVID Workshop Nov 2010 TRECVID Workshop, Nov. 2010

SLIDE 2

O tli Outline

A h Approach Subsystems and features Subsystems and features Fusion strategies Results Conclusion Conclusion

SLIDE 3

A h Approach

f ll t ti fully automatic set of independent subsystems, using different features query each sample of a topic independently q y p p p y each subsystem returns a ranked result list for each sample for each sample research focus: fusion strategies

SLIDE 4

S t O i System Overview

SLIDE 5

S b t (1) Subsystems (1)

G b f t Gabor feature

perform face detection (Viola-Jones) if face detected, extract Gabor wavelet descriptor from face region , p g match against descriptors of all face regions in database k-NN search

Hi t f di t Histogram of gradients

not used for person/character descriptor with 36 bins (9 orientations, 4 cells) descriptor with 36 bins (9 orientations, 4 cells) cell layout is adapted to aspect ratio of query object: 2x2 or 1x4 cells search window is shifted ¼ cell size search window is shifted ¼ cell size 3 scales: 1x, 1.5x and 2x initial size

SLIDE 6

S b t (2) Subsystems (2)

R i i Region covariance

covariance of rectangular region (can be determined efficiently using integral images) from RGB and first-order derivatives of intensity same cell sizes/scales as for HoG

SIFT SIFT

from DoG points matching: voting in a position histogram (1/10 of image size), g g g ( g ) report match for bins with 5+ votes

Bag of visual features (BoF)

SIFT descriptors from DoG points and global SIFT descriptors from DoG points and global codebook sizes 100 and 1000 for both

SLIDE 7

P t d f t Pre-computed features

P t d f d t b Pre-computed for database

face detection + Gabor descriptor global SIFT extraction BoF codebook generation

At query time

interest point detection + SIFT extraction HoG Region covariance g

SLIDE 8

F i t t i (1) Fusion strategies (1)

T i l th d t ki f Two simple methods, not making use of query samples Max-max

For each shot in the results, take maximum scope of all l d f t samples and features

Top-k

For each feature take for each shot the maximum of all For each feature, take for each shot the maximum of all samples Rerank per feature Take the top-k per feature (k=1000/no. features used)

SLIDE 9

F i t t i (2) Fusion strategies (2)

T th d i l Two methods using query samples

idea: weight features by their relative performance for each sample determine where the other samples would for each sample, determine where the other samples would be ranked in the result if they were in the database

best rank best rank

determine mean best rank over all samples for each feature calculate feature weight as

top 100

d i h l i h 100 l determine how many samples are in the top 100 results calculate feature weight as

SLIDE 10

R lt t i /t Results per topic/type

0 08 0,09 0,1 JRS rank max_max JRS rank topK JRS rank w bestR 0,05 0,06 0,07 0,08 JRS rank w_bestR JRS rank w_t100 0,02 0,03 0,04 0,01 10

SLIDE 11

R lt f t Results per feature

0,025 mean (all) mean (person) mean (character) mean (object) mean (location) BOF100G BOF100L BOF1000G BOF1000L 0,020 BOF1000L Gabor HoG RegCov SIFT 0 010 0,015 0,005 0,010 0,000 11

SLIDE 12

C l i (1) Conclusion (1)

T k i diffi lt lt f t ti t Task is difficult, results for automatic system poor

different sizes, lighting, perspectives, … “needle in a haystack”: very few relevant results in a large needle in a haystack : very few relevant results in a large set with many similar objects (e.g. pedestrian crossing, blinds)

Features

as expected, our features perform best for object queries better results could be possible for some of the features but better results could be possible for some of the features, but would make matching process more costly

SLIDE 13

C l i (2) Conclusion (2)

F i th d Fusion methods

Overall, the fusion methods using information from query samples perform better samples perform better Only slight difference for object queries

To fuse or not to fuse? To fuse or not to fuse?

for person and object queries, a single feature outperforms the best fused results few topics for the other query types, thus difficult to say if fusion is actually useful in these cases

SLIDE 14

Th h l di t th lt h i d f di f th E The research leading to these results has received funding from the European Union’s Seventh Framework Programme under the grant agreements no. FP7- 215475, “2020 3D Media – Spatial Sound and Vision” (http://www.20203dmedia.eu/) and no. FP7-248138, “FascinatE – Format- (http://www.20203dmedia.eu/) and no. FP7 248138, FascinatE Format Agnostic Script based INterAcTive Experience” (http://www.fascinate- project.eu/), as well as from the Austrian FIT-IT project “IV-ART – Intelligent Video Annotation and Retrieval Techniques”.