video with temporal match kernel
play

Video with Temporal Match Kernel Shinichi Satoh 23 Junfu Pu 1 Yusuke - PowerPoint PPT Presentation

Energy Based Fast Event Retrieval in Video with Temporal Match Kernel Shinichi Satoh 23 Junfu Pu 1 Yusuke Matsui 2 Fan Yang 32 1. University of Science and Technology of China 2. National Institute of Informatics 3. The University of Tokyo


  1. Energy Based Fast Event Retrieval in Video with Temporal Match Kernel Shin’ichi Satoh 23 Junfu Pu 1 Yusuke Matsui 2 Fan Yang 32 1. University of Science and Technology of China 2. National Institute of Informatics 3. The University of Tokyo

  2. Outline  Introduction  Background  Matching with Energy  Algorithm Speed up with PQ  Experiments  Conclusion 2

  3. Introduction  Approach for fast content-based search in large video database Query Database 3

  4. Introduction  Related work  Jerome Revaud, et al., Event retrieval in large video collections with circulant temporal encoding, CVPR, 2013  Matthijs Douze, et al., Stable hyper-pooling and query expansion for event detection, ICCV, 2013  Sebastien Poullot, et.al, Temporal matching kernel with explicit feature maps, ACM MM, 2015  Contribution  Simplify the similarity metric by calculating the energy of the score function Derive the energy formulation by Parseval’s theorem   Accelerate the computation with product quantization 4

  5. Background 𝐲 = (𝒚 0 , … , 𝒚 𝑢 … ) y = 𝒛 0 , … , 𝒛 𝑢 … time offset: ∆ A kernel defined with 𝐲 , 𝐳 , and ∆ 𝑈 𝑈 ∞ ∞ ∞ 𝒛 𝑢 ′ ⨂𝜒 𝑢 ′ +△ 𝑈 𝒛 𝑢+△ = 𝜆 △ 𝐲, 𝐳 ∝ 𝒚 𝑢 𝒚 𝑢 ⨂𝜒 𝑢   𝑢 ′ =0 𝑢=0 𝑢=0 𝜔 △ 𝒛 𝜔 0 𝐲 𝑈 𝑈 , 𝐖 𝑈 , … , 𝐖 𝑛,𝑑 𝑈 , 𝐖 𝑛,𝑡 𝑏 0 𝑈 , 𝐖 𝑈 𝜔 0 𝐲 = 𝐖 0 1,𝑑 1,𝑡 𝑏 1 cos(2𝜌 𝑈 𝑢) ∞ 𝒚 𝑢 ∈ ℝ 𝐸 , 𝑏 1 sin(2𝜌 𝐖 0 = 𝑏 0 𝑈 𝑢) 𝜒 𝑢 = 𝑢=0 ∞ ⋮ 𝒚 𝑢 cos(2𝜌 𝑏 𝑛 cos(2𝜌 𝑈 𝑗𝑢) ∈ ℝ 𝐸 𝐖 𝑗,c = 𝑏 𝑗 𝑈 𝑛𝑢) 𝑢=0 ∞ 𝑏 𝑛 sin(2𝜌 𝒚 𝑢 sin(2𝜌 𝑈 𝑛𝑢) 𝑈 𝑗𝑢) ∈ ℝ 𝐸 𝐖 𝑗,𝑡 = 𝑏 𝑗 𝑢=0 5 𝑏 𝑗 : the fourier coefficients

  6. Background  Final Formulation 𝐲 , 𝐖 0 (𝐳) 𝜆 𝐲,𝐳 △ = 𝐖 0 𝑛 𝐲 , 𝐖 𝑜,𝑑 𝐲 , 𝐖 𝑜,𝑡 𝐳 𝐳 + cos 𝑜 △ 𝐖 𝑜,𝑑 + 𝐖 𝑜,𝑡 𝑜=1 𝑛 𝐲 , 𝐖 𝑜,𝑡 𝐲 , 𝐖 𝑜,𝑑 𝐳 𝐳 + sin 𝑜 △ − 𝐖 𝑜,𝑑 + 𝐖 𝑜,𝑡 𝑜=1  Similarity Score 𝑇 𝐲, 𝐳 = max 𝜆 𝐲,𝐳 △ △ 𝑢 𝑛 = arg max 𝜆 𝐲,𝐳 △ △ 6

  7. Our Method  Matching with energy 𝐹 𝜆 𝐲,𝐳 1 > 𝐹 𝜆 𝐲,𝐳 2 if 𝑇 𝐲, 𝐳 1 > 𝑇 𝐲, 𝐳 2 𝑇 𝐲, 𝒛 = 𝐹(𝜆 𝐲,𝐳 (△)) Denote the Fourier series of 𝑔(𝑦) as 𝑛 𝑛 𝑔 𝑦 = 1 2 𝑑 0 + 𝑑 𝑜 cos 𝑜𝑦 + 𝑡 𝑜 sin(𝑜𝑦) 𝑜=1 𝑜=1 The energy of 𝑔(𝑦) is ∞ 𝑔 𝑦 2 𝑒𝑦 𝐹 𝑔 𝑦 = −∞ According to the Parseval’s Theorem 𝑜 ∞ 1 2 + 𝑡 𝑗 2 + 𝑑 0 2 𝑒𝑦 = 2 2𝜌 𝑔 𝑦 𝑑 𝑗 −∞ 𝑗=1 7

  8. Our Method  Matching with energy The final form of the energy 𝑇 𝐲, 𝐳 for 𝜆 𝐲,𝐳 △ is 𝑇 𝐲, 𝐳 = 𝐹 𝜆 𝐲,𝐳 △ 𝑛 2 𝐲 , 𝐖 𝑜,𝑑 𝐲 , 𝐖 𝑜,𝑡 𝐳 𝐳 = 𝐖 𝑜,𝑑 + 𝐖 𝑜,𝑡 𝑜=1  Generalized formulation 𝑛 𝑞 2 𝑞 𝑇 𝑞 𝐲, 𝐳 = 2 + 𝑡 𝑗 𝑑 𝑗 𝑗=1 𝑛 𝑞 1 2 𝑞 = max 𝑇 ∞ 𝐲, 𝐳 = lim 2 + 𝑡 𝑗 2 + 𝑡 𝑜 2 𝑞 𝑑 𝑗 𝑑 𝑜 𝑁 𝑞→∞ 𝑜 𝑗=1 8

  9. Our Method  Matching with energy  Given a query video, go through the candidate in database Calculate the  𝑇 𝐲, 𝐳 between query and candidate Retrieval with  𝑇 𝐲, 𝐳  Advantages  More stable (maximum of 𝑇(𝐲, 𝐳) is sensitive to noise)  Lower computational complexity  Further accelerate the computation using approximate nearest neighbor method such as PQ 9

  10. Our Method  Algorithm speedup with PQ 𝑘th codebook 𝒅 𝑘∗ generated from (𝐲 𝑗 ) : 𝑗 ∈ {1, … , 𝑂} ⋃ 𝐖 (𝐲 𝑗 ) : 𝑗 ∈ {1, … , 𝑂} 𝐖 𝑘,𝑑 𝑘,𝑡  Searching steps Quantize query 𝑟 to its 𝜕 nearest neighbors with  𝑇 𝐲, 𝐳  Compute the squared distances and dot product for each subquantizer 𝑘 and each of its centroid 𝒅 𝑘𝑗  Using the subvector-to-centroid distance, calculate the similarity score 𝑇 𝐲, 𝐳 Order the candidates by decreasing  𝑇 𝐲, 𝐳 10

  11. Experiments  EVent VidEo (EVVE) dataset [CVPR’13]  620 queries, 2375 database videos, 13 events  1024-D multi-VLAD frame descriptor  Experimental results 𝑞 mAP 𝑛 𝑞 2 𝑞 𝑇 𝑞 𝐲, 𝐳 = 2 + 𝑡 𝑗 𝑑 𝑗 𝑗=1 The average mAP using 𝑇 𝑞 𝐲, 𝐳 for different 𝑞 11

  12. Experiment  Results on EVVE and comparison Baseline (temporal match kernel): MM’15 MMV (mean-multiVLAD ): CVPR’13 CTE (circulant temporal encoding): CVPR’13 SHP (stable hyper- pooling): ICCV’13 12

  13. Conclusion  Propose a fast event retrieval method in video database with temporal match kernel  Use the energy of the score function as similarity metric  Derive the simplified energy formulation by using Parsevals’s theorem  With the energy formulation, we use PQ to accelerate the computation  Achieve competitive performance with the-state-of- the-art 13

  14. Thank you! 

Recommend


More recommend