[PPT] - Telefonica Research @ Trecvid 2011 Xavier Anguera, Daru Xu PowerPoint Presentation

SLIDE 1

Telefonica ¡Research ¡ @ ¡Trecvid ¡2011 ¡

Xavier ¡Anguera, ¡Daru ¡Xu1 ¡and ¡Tomasz ¡ Adamek ¡ (With ¡the ¡collaboraBon ¡of ¡Juan ¡Manuel ¡ Barrios, ¡Prisma ¡Group) ¡

1Daru ¡Xu ¡is ¡a ¡graduate ¡student ¡at ¡the ¡Ming-‑Hsieh ¡Department ¡of ¡Electrical ¡Engineering, ¡

University ¡of ¡Southern ¡California, ¡USA ¡ ¡ ¡

SLIDE 2

Outline ¡of ¡the ¡talk ¡

Telefonica ¡2011 ¡Video-‑copy ¡detecBon ¡system ¡

– Overall ¡system ¡ – Video-‑copy ¡detecBon ¡ – Audio-‑copy ¡detecBon ¡ – Fusion ¡algorithm ¡ – Results ¡

MulB-‑systems ¡fusion ¡experiment ¡

SLIDE 3

MulBmodal ¡Video-‑copy ¡detecBon ¡

SLIDE 4

Video-‑based ¡System ¡

Video ¡query ¡ Key-‑frame ¡ extracBon ¡

DART* ¡local ¡features ¡extracBon ¡

Temporal ¡ consistency ¡ ¡ post-‑processing ¡ Matched ¡video ¡ segments ¡ Key-‑frame ¡ matching ¡ ¡

Ref. ¡Video ¡

indexing ¡

info. ¡

Inserted ¡staBc ¡ text ¡& ¡banners ¡ filtering ¡ SubBtle ¡ filtering ¡ Temporal ¡stability ¡& ¡ scale ¡filtering ¡ DART* ¡ extracBon ¡

Differences ¡from ¡last ¡year: ¡

So]ware ¡refactoring ¡
EliminaBon ¡of ¡temporary ¡files ¡

* ¡D. ¡Marimon, ¡A. ¡Bonnin, ¡T. ¡Adamek, ¡and ¡R ¡.Gimeno, ¡“DARTs:Efficient ¡scale-‑space ¡ extracBon ¡of ¡daisy ¡key-‑points”, ¡CVPR ¡2009. ¡

SLIDE 5

Audio-‑based ¡System ¡

SLIDE 6

MASK ¡fingerprint ¡extracBon ¡(I) ¡

1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡

SLIDE 7

AcousBc ¡fingerprint ¡extracBon ¡(I) ¡

2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 300-‑3KHz ¡

10ms, ¡100ms ¡window ¡ 32 ¡MEL-‑spectrum ¡bands ¡

1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡

SLIDE 8

AcousBc ¡fingerprint ¡extracBon ¡(I) ¡

2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 300-‑3KHz ¡

10ms, ¡100ms ¡window ¡ 32 ¡MEL-‑spectrum ¡bands ¡

1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡ 3) ¡Find ¡spectrogram ¡

peaks. ¡

SLIDE 9

AcousBc ¡fingerprint ¡extracBon ¡(II) ¡

4) ¡Apply ¡a ¡mask ¡in ¡each ¡ maxima ¡locaBon ¡

SLIDE 10

AcousBc ¡fingerprint ¡extracBon ¡(II) ¡

5) ¡Construct ¡the ¡fingerprint ¡

SLIDE 11

MulBmodal ¡Fusion ¡Algorithm ¡

Fusion ¡of ¡different ¡modaliBes ¡at ¡decision ¡level ¡

– AgnosBc ¡of ¡internal ¡system’s ¡behaviors ¡

No ¡limit ¡on ¡the ¡number ¡of ¡systems ¡to ¡be ¡combined ¡

– provided ¡each ¡system ¡is ¡bejer ¡than ¡random ¡

To ¡work ¡opBmally ¡it ¡needs ¡N-‑best ¡matches ¡from ¡each ¡
system. ¡It ¡returns ¡the ¡best ¡fused ¡matches ¡(N=20) ¡

– Makes ¡use ¡of ¡the ¡individual ¡scores ¡and ¡the ¡rank ¡within ¡ each ¡modality. ¡

Paper ¡on ¡ACM ¡MM ¡2011: ¡“MulBmodal ¡Fusion ¡for ¡Video ¡Copy ¡DetecBon”, ¡Xavier ¡Anguera, ¡ ¡ Juan ¡Manuel ¡Barrios, ¡Tomasz ¡Adamek ¡and ¡Nuria ¡Oliver ¡

SLIDE 12

Data ¡preprocessing ¡

Local ¡video ¡scores ¡histogram ¡ Audio ¡scores ¡histogram ¡ Global ¡video ¡scores ¡histogram ¡

SLIDE 13

N-‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(I) ¡

L1 ¡normalizaBon ¡

MScorei = MScorei MScorej

j=1 Nbest

!

0.7 ¡ 0.3 ¡ MScores ¡ MScores ¡ 0.3 ¡ 0.6 ¡ MScores ¡ MScores ¡

SLIDE 14

N-‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(II) ¡

N-‑best ¡flooring ¡and ¡ ¡ L1 ¡normalizaBon ¡

MScorei = MScorei MScorej

j=1 Nbest

!

0.7 ¡ 0.3 ¡ MScores ¡ MScores ¡ 0.3 ¡ MScores ¡ MScores ¡

N-‑best ¡Flooring ¡

0.2 ¡

SLIDE 15

Overlapping ¡Segments ¡Merge ¡

Segment ¡Q ¡ Segment ¡R ¡

+ ¡ = ¡

Merged ¡segment ¡

Br

Q

Er

Q

Br

R

Er

R

Br Er

min{EQ

k (r), ER k (r)} − max{BQ k (r), BR k (r)}

max{EQ

k (r), ER k (r)} − min{BQ k (r), BR k (r)}

> 0.5

Examples: ¡ MuBmodal ¡

verlap ¡

Missing ¡ ¡ modality ¡ Non-‑overlapping ¡ modaliBes ¡

SLIDE 16

Output ¡score ¡computaBon ¡

ResulBng ¡score ¡ for ¡fused ¡match ¡ A-‑priori ¡weight ¡for ¡ each ¡modality ¡ Rank ¡[1 ¡to ¡Nk] ¡ Number ¡of ¡matches ¡ Normalized ¡ matching ¡score ¡ at ¡rank ¡r ¡ Best ¡normalized ¡ matching ¡score ¡for ¡ each ¡modality ¡

SLIDE 17

Official ¡evaluaBon ¡results ¡

Profile ¡ Min ¡ NDCR ¡ FA ¡count ¡ Miss ¡ count ¡ True ¡ posi:ves ¡ Opt ¡F1 ¡ score ¡ Audio ¡system ¡ BALANCED ¡ 0.662 ¡ 0.66 ¡ 54.75 ¡ 54.78 ¡ 0.729 ¡ MulBmodal ¡ BALANCED ¡ 0.610 ¡ 0.80 ¡ 11.73 ¡ 63.69 ¡ 0.947 ¡ Joint ¡ BALANCED ¡ 0.268 ¡ 0.23 ¡ 4.71 ¡ 101.4 ¡ 0.957 ¡ OpBmum ¡scores, ¡balanced ¡profile: ¡ Profile ¡ Min ¡ NDCR ¡ FA ¡count ¡ Miss ¡ count ¡ True ¡ posi:ves ¡ Opt ¡F1 ¡ score ¡ Audio ¡system ¡ BALANCED ¡ 0.477 ¡ 0.14 ¡ 55.89 ¡ 72.05 ¡ 0.712 ¡ Choosing ¡only ¡1st-‑best ¡results: ¡ ¡

SLIDE 18

MulB-‑systems ¡fusion ¡experiment ¡

We ¡tested ¡the ¡fusion ¡algorithm ¡with ¡many ¡

system ¡outputs ¡

We ¡asked ¡parBcipants ¡in ¡TRECVID ¡2011 ¡for ¡their ¡

submijed ¡runs ¡

– 10 ¡teams ¡contributed ¡their ¡results: ¡PKU-‑IDM, ¡CRIM, ¡ INRIA-‑TEXMEX/LEAR, ¡FT, ¡prisma, ¡ATTLabs, ¡kddi, ¡iupr-‑ dti, ¡brno, ¡Telefonica ¡Research ¡ – I ¡used ¡the ¡“Balanced” ¡runs: ¡17 ¡runs ¡

SLIDE 19

Status ¡of ¡the ¡runs ¡

The ¡fusion ¡algorithm ¡works ¡opBmally ¡when ¡

Nbest ¡results ¡are ¡available ¡for ¡each ¡fused ¡

utput. ¡

– Results ¡for ¡the ¡used ¡systems ¡had ¡(many ¡Bmes) ¡

nly ¡1best ¡results, ¡resulBng ¡subopBmal ¡for ¡the ¡
fusion. ¡

SLIDE 20

Individual ¡results ¡(Min ¡NDCR) ¡

Labeled ¡from ¡1 ¡to ¡17, ¡to ¡anonymize ¡them. ¡

0 ¡ 0.1 ¡ 0.2 ¡ 0.3 ¡ 0.4 ¡ 0.5 ¡ 0.6 ¡ 0.7 ¡ 0.8 ¡ 0.9 ¡ 1 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡

Min_NDCR ¡ 0.053 ¡ 0.991 ¡

SLIDE 21

Individual ¡results ¡(opBmum ¡F1) ¡

0.6 ¡ 0.65 ¡ 0.7 ¡ 0.75 ¡ 0.8 ¡ 0.85 ¡ 0.9 ¡ 0.95 ¡ 1 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡

Min_NDCR ¡

SLIDE 22

0 ¡ 0.02 ¡ 0.04 ¡ 0.06 ¡ 0.08 ¡ 0.1 ¡ 0.12 ¡ 0.14 ¡ 0.16 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡ Min ¡NDCR ¡ Number ¡of ¡systems ¡

Incremental ¡fusion ¡

We ¡incrementally ¡added ¡systems ¡and ¡computed ¡the ¡fusion ¡ ¡
Systems ¡5 ¡and ¡15 ¡are ¡the ¡only ¡ones ¡making ¡the ¡fusion ¡worse ¡
Final ¡Min_NDCR=0.0333 ¡

Min_NDCR ¡ 0.0333 ¡ 0.0532 ¡

SLIDE 23

Fusion ¡of ¡all ¡minus ¡1 ¡

We ¡obtain ¡an ¡order ¡from ¡worse ¡to ¡best ¡in ¡the ¡ fusion ¡(worse ¡in ¡here ¡is ¡system ¡15) ¡

0.02 ¡ 0.025 ¡ 0.03 ¡ 0.035 ¡ 0.04 ¡ 0.045 ¡ 0.05 ¡ 0 ¡ 1 ¡ 2 ¡ 3 ¡ 4 ¡ 5 ¡ 6 ¡ 7 ¡ 8 ¡ 9 ¡ 10 ¡ 11 ¡ 12 ¡ 13 ¡ 14 ¡ 15 ¡ 16 ¡ 17 ¡

Baseline ¡(fusion ¡of ¡all) ¡ Min_NDCR ¡

SLIDE 24

Incremental ¡eliminaBon ¡

With ¡only ¡5 ¡systems ¡we ¡achieve ¡prejy ¡decent ¡results ¡
The ¡best ¡result ¡is ¡0.0195, ¡although ¡this ¡is ¡“cheaBng” ¡

0 ¡ 0.05 ¡ 0.1 ¡ 0.15 ¡ 0.2 ¡ 0.25 ¡ 0.3 ¡ 0.35 ¡ 0.4 ¡ 0.45 ¡ 0 ¡ 15 ¡ 16 ¡ 11 ¡ 7 ¡ 6 ¡ 14 ¡ 3 ¡ 4 ¡ 9 ¡ 10 ¡ 2 ¡ 17 ¡ 13 ¡ 5 ¡ 1 ¡ 8 ¡

Min_NDCR ¡ 0.0195 ¡ 0.0333 ¡ 0.0685 ¡

SLIDE 25

Conclusions ¡

The ¡fusion ¡algorithm ¡can ¡extract ¡knowledge ¡

and ¡make ¡results ¡bejer ¡

– Even ¡if ¡fusing ¡systems ¡which ¡have ¡weaker ¡NDCR ¡ results, ¡the ¡fusion ¡results ¡in ¡good ¡scores. ¡

FUTURE ¡WORK: ¡automaBcally ¡idenBfy ¡which ¡

Telefonica ¡Research ¡ @ ¡Trecvid ¡2011 ¡

Xavier ¡Anguera, ¡Daru ¡Xu1 ¡and ¡Tomasz ¡ Adamek ¡ (With ¡the ¡collaboraBon ¡of ¡Juan ¡Manuel ¡ Barrios, ¡Prisma ¡Group) ¡

Outline ¡of ¡the ¡talk ¡

– Overall ¡system ¡ – Video-­‑copy ¡detecBon ¡ – Audio-­‑copy ¡detecBon ¡ – Fusion ¡algorithm ¡ – Results ¡

MulBmodal ¡Video-­‑copy ¡detecBon ¡

Video-­‑based ¡System ¡

Audio-­‑based ¡System ¡

MASK ¡fingerprint ¡extracBon ¡(I) ¡

1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡

AcousBc ¡fingerprint ¡extracBon ¡(I) ¡

2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 300-­‑3KHz ¡

1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡

AcousBc ¡fingerprint ¡extracBon ¡(I) ¡

2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 300-­‑3KHz ¡

1) ¡Audio ¡track ¡extracBon ¡using ¡FFMPEG ¡ 3) ¡Find ¡spectrogram ¡

AcousBc ¡fingerprint ¡extracBon ¡(II) ¡

4) ¡Apply ¡a ¡mask ¡in ¡each ¡ maxima ¡locaBon ¡

AcousBc ¡fingerprint ¡extracBon ¡(II) ¡

5) ¡Construct ¡the ¡fingerprint ¡

MulBmodal ¡Fusion ¡Algorithm ¡

– AgnosBc ¡of ¡internal ¡system’s ¡behaviors ¡

– provided ¡each ¡system ¡is ¡bejer ¡than ¡random ¡

– Makes ¡use ¡of ¡the ¡individual ¡scores ¡and ¡the ¡rank ¡within ¡ each ¡modality. ¡

Data ¡preprocessing ¡

N-­‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(I) ¡

N-­‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(II) ¡

Overlapping ¡Segments ¡Merge ¡

Output ¡score ¡computaBon ¡

Official ¡evaluaBon ¡results ¡

MulB-­‑systems ¡fusion ¡experiment ¡

system ¡outputs ¡

submijed ¡runs ¡

– 10 ¡teams ¡contributed ¡their ¡results: ¡PKU-­‑IDM, ¡CRIM, ¡ INRIA-­‑TEXMEX/LEAR, ¡FT, ¡prisma, ¡ATTLabs, ¡kddi, ¡iupr-­‑ dti, ¡brno, ¡Telefonica ¡Research ¡ – I ¡used ¡the ¡“Balanced” ¡runs: ¡17 ¡runs ¡

Status ¡of ¡the ¡runs ¡

Nbest ¡results ¡are ¡available ¡for ¡each ¡fused ¡

– Results ¡for ¡the ¡used ¡systems ¡had ¡(many ¡Bmes) ¡

Individual ¡results ¡(Min ¡NDCR) ¡

Individual ¡results ¡(opBmum ¡F1) ¡

Incremental ¡fusion ¡

Fusion ¡of ¡all ¡minus ¡1 ¡

We ¡obtain ¡an ¡order ¡from ¡worse ¡to ¡best ¡in ¡the ¡ fusion ¡(worse ¡in ¡here ¡is ¡system ¡15) ¡

Incremental ¡eliminaBon ¡

Conclusions ¡

and ¡make ¡results ¡bejer ¡

– Even ¡if ¡fusing ¡systems ¡which ¡have ¡weaker ¡NDCR ¡ results, ¡the ¡fusion ¡results ¡in ¡good ¡scores. ¡

modaliBes ¡bring ¡novelty. ¡

– Overall ¡system ¡ – Video-‑copy ¡detecBon ¡ – Audio-‑copy ¡detecBon ¡ – Fusion ¡algorithm ¡ – Results ¡

MulBmodal ¡Video-‑copy ¡detecBon ¡

Video-‑based ¡System ¡

Audio-‑based ¡System ¡

2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 300-‑3KHz ¡

2) ¡FFT, ¡bandwidth ¡ limited ¡to ¡ 300-‑3KHz ¡

N-‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(I) ¡

N-‑best ¡flooring ¡and ¡L1 ¡NormalizaBon ¡(II) ¡

MulB-‑systems ¡fusion ¡experiment ¡

– 10 ¡teams ¡contributed ¡their ¡results: ¡PKU-‑IDM, ¡CRIM, ¡ INRIA-‑TEXMEX/LEAR, ¡FT, ¡prisma, ¡ATTLabs, ¡kddi, ¡iupr-‑ dti, ¡brno, ¡Telefonica ¡Research ¡ – I ¡used ¡the ¡“Balanced” ¡runs: ¡17 ¡runs ¡