INRIA@TRECVID-CCD Jiangbo Cordelia Herv Yuan Jerome Jonathan - PowerPoint PPT Presentation

INRIA@TRECVID-CCD Jiangbo Cordelia Hervé Yuan Jerome Jonathan Schmid Jégou Revaud Delhumeau Matthijs Douze

Conclusions and questions from last year  What are the individual contributions of audio and video ?  Audio weaker than video, apparently ► But complementary to image ► Further improvement possible ?  Fusion step is critical ► Is early fusion an option ?  Scoring strategies to optimize NDCR looks critical ► Keep maximum 1 result per query ?

Our Runs at Trecvid  5 runs to measure the individual contributions of our system  2 runs designed for “best” search quality: the DODO runs

Video visual system: ingredients (same as in 2010)  Local descriptors: CS-LBP  Weak geometric consistency  Hamming Embedding  Burstiness strategy Improve bag-of-features ► + Multi-probe

Audio system: basic ingredients (same as in 2010)  Base descriptor: Filter  Compounding banks f 1 f 2 f 3 f N 500 Hz 3000 Hz Overall bandwidth 25ms dim. 40  Overlapping temporal analysis 85ms dim. 120 d m  Matching: product quantization d 4 d 3 d 2 d 1 0 10 20 10 30 Time (ms)

New ingredient 1: temporal shift DB descriptors 10ms Query misaligned: Not lucky! 5ms Query descriptors: query all shifts (5 * slower!) 2ms shift 4ms shift 6ms shift 8ms shift

New ingredient 2: reciprocal nearest neighbors  Audio matches: k-nearest neighbors  Pb: if X neighbor of Y, Y not necessarily neighbor of X  Weighted Reciprocal nearest neighbors

Audio: improvement over last year  6 times slower in total, for a limited improvement

New ingredient 3: early fusion audio/video  Early fusion: ► Input: image & audio raw Hough hypotheses ► Robust time warping to align query frames with DB frames query time Example DB time Audio frame matches of time Image frame matches warping Resulting optimal path matrix:

Early fusion  Input: image & audio raw Hough hypotheses  1. Robust time warping - align query frames with db frames  2. Description of matching segments  segment length, number of audio/image frame matches, …  surface of the image recognized on the database side  KL-divergence between db keypoints distribution / matches distribution  relative support of image & audio for the hypothesis  etc.  3. Classifier produces a score

Early fusion: training the classifier  Boosting scheme: ► Each iteration, addition of a new feature  Criterion: maximize AP on validation set ► Classifier: Logistic regression (better than SVM here)  40,000 positive samples  150,000 negative examples  Result: selected features (sorted) ► Detected area ► Nb of audio & image frame matches ► KL divergence between keypoints distribution ► Length of matching segment in seconds ► Etc…

2010 vs 2011 approach

Analysis: balanced profile (opt-NDCR) RUN AUDIO VIDEO FUSION NDCR (avg) DEAF no yes n/a 0.258 AUDIOONLY yes no n/a 0.406 THEMIS yes yes late 0.211 ZOZO yes yes late 0.194 DODO yes yes early 0.144  One surprise: ZOZO > THEMIS ► Keeping more than 1 result is better if scores are ties

Overview of results (opt-NDCR) Balanced profile NoFa profile RANK INRIA PKU CRIM NTT- RANK INRIA PKU CRIM NTT- CSL CSL 1 5 31 21 1 1 23 14 18 8 2 16 23 8 3 2 10 31 11 7 3 9 2 9 5 3 11 10 13 4 4 19 0 10 7 4 9 1 9 5 5 4 0 4 4 5 3 0 4 7 6 1 0 4 13 6 0 0 1 5 7 2 0 0 12 7 0 0 0 3  PKU and CRIM are much better with Actual-NDCR ► We don’t know how to set the threshold ► This problem may be inherent to our system

Introduction of Babaz audio matching system  Open source: http://babaz.gforge.inria.fr/  Well… PQ-codes replaced by k-means LSH (licensing issue) ► Requires more memory (40GB instead of 5GB) and slower ► But PQ-codes Matlab implementation available  All Trecvid queries: query times (16 cores), memory, mAP ► Pqcodes – heavy: 20H 5GB mAP = 80.7 % ► Pqcodes – light: 3H 5GB mAP = 78.9 % ► K-means LSH: 25H 40GB mAP = 78.8 %  Offline: Pqcodes-h: 69H, Pqcodes-l: 11H, KMLSH: 17H

Questions?

INRIA@TRECVID-CCD Jiangbo Cordelia Herv Yuan Jerome Jonathan - PowerPoint PPT Presentation

INRIA@TRECVID-CCD Jiangbo Cordelia Herv Yuan Jerome Jonathan Schmid Jgou Revaud Delhumeau Matthijs Douze Conclusions and questions from last year What are the individual contributions of audio and video ? Audio weaker than

Continuity of Care Document (CCD) Continuity of Care Document (CCD) USA USA Health Information

CCD Imaging and Processing 0 .0 2 Aschen 3.1 Alan Chen CFAS St ar Part y 20 0 2 1 0 .0 2 CCD

DAMIC and a kg-size CCD experiment Paolo Privitera for the DAMIC Collaboration (Photo image:

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

CCD Image Processing: CCD Image Processing: Issues & Solutions Issues & Solutions 1

PKU-IDM@TRECVID-CCD 2010: Copy Detection with Visual-Audio Feature Fusion and Sequential Pyramid

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

INRIA LEAR-TEXMEX: Copy detection task INRIA TEXMEX (RENNES) INRIA LEAR Herv Jgou

Hardware Accelerated Application Integration: Challenges and Opportunities Active @

Motivation for Simulations NS-2 Tutorial Cheap -- does not require costly equipment

The relevance of quantitative theory for historical demography David de la Croix Universit

Cluster Based Routing (CBR) Protocol with Adaptive Scheduling for Mobility and Energy Awareness in

LOCAL NAVIGATION 2 FORCE-BASED BOOKKEEPING Social force models The forces are first-class

Dark Matter Seach with CCDs - DAMIC Juan Cruz Estrada - Fermilab TAUP , July 2009 Rome, Italy

Status of the Pellet Target preparation in ITEP M.Bscher, A.Gerasimov, V.Chernetsky,

EHEALTH COMMISSION MEETING OCTOBER 10TH, 2018 OCTOBER AGENDA Call to Order Roll Call and

INRIA@TRECVID-CCD Jiangbo Cordelia Herv Yuan Jerome Jonathan - PowerPoint PPT Presentation

INRIA@TRECVID-CCD Jiangbo Cordelia Herv Yuan Jerome Jonathan Schmid Jgou Revaud Delhumeau Matthijs Douze Conclusions and questions from last year What are the individual contributions of audio and video ? Audio weaker than

Continuity of Care Document (CCD) Continuity of Care Document (CCD) USA USA Health Information

CCD Imaging and Processing 0 .0 2 Aschen 3.1 Alan Chen CFAS St ar Part y 20 0 2 1 0 .0 2 CCD

DAMIC and a kg-size CCD experiment Paolo Privitera for the DAMIC Collaboration (Photo image:

CCD Image Processing: CCD Image Processing: [ ] [ ] r x y , d x y , Raw File [ ]

CCD Image Processing: CCD Image Processing: Issues &amp; Solutions Issues &amp; Solutions 1

PKU-IDM@TRECVID-CCD 2010: Copy Detection with Visual-Audio Feature Fusion and Sequential Pyramid

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &amp;

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

INRIA LEAR-TEXMEX: Copy detection task INRIA TEXMEX (RENNES) INRIA LEAR Herv Jgou

Hardware Accelerated Application Integration: Challenges and Opportunities Active @

Motivation for Simulations NS-2 Tutorial Cheap -- does not require costly equipment

The relevance of quantitative theory for historical demography David de la Croix Universit

Cluster Based Routing (CBR) Protocol with Adaptive Scheduling for Mobility and Energy Awareness in

LOCAL NAVIGATION 2 FORCE-BASED BOOKKEEPING Social force models The forces are first-class

Dark Matter Seach with CCDs - DAMIC Juan Cruz Estrada - Fermilab TAUP , July 2009 Rome, Italy

Status of the Pellet Target preparation in ITEP M.Bscher, A.Gerasimov, V.Chernetsky,

EHEALTH COMMISSION MEETING OCTOBER 10TH, 2018 OCTOBER AGENDA Call to Order Roll Call and

CCD Image Processing: CCD Image Processing: Issues & Solutions Issues & Solutions 1

PKU-IDM @ TRECVID 2011 CCD: Video Copy Detection using a Cascade of Multimodal Features &

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science