Energy Based Fast Event Retrieval in Video with Temporal Match Kernel Shin’ichi Satoh 23 Junfu Pu 1 Yusuke Matsui 2 Fan Yang 32 1. University of Science and Technology of China 2. National Institute of Informatics 3. The University of Tokyo
Outline Introduction Background Matching with Energy Algorithm Speed up with PQ Experiments Conclusion 2
Introduction Approach for fast content-based search in large video database Query Database 3
Introduction Related work Jerome Revaud, et al., Event retrieval in large video collections with circulant temporal encoding, CVPR, 2013 Matthijs Douze, et al., Stable hyper-pooling and query expansion for event detection, ICCV, 2013 Sebastien Poullot, et.al, Temporal matching kernel with explicit feature maps, ACM MM, 2015 Contribution Simplify the similarity metric by calculating the energy of the score function Derive the energy formulation by Parseval’s theorem Accelerate the computation with product quantization 4
Background 𝐲 = (𝒚 0 , … , 𝒚 𝑢 … ) y = 𝒛 0 , … , 𝒛 𝑢 … time offset: ∆ A kernel defined with 𝐲 , 𝐳 , and ∆ 𝑈 𝑈 ∞ ∞ ∞ 𝒛 𝑢 ′ ⨂𝜒 𝑢 ′ +△ 𝑈 𝒛 𝑢+△ = 𝜆 △ 𝐲, 𝐳 ∝ 𝒚 𝑢 𝒚 𝑢 ⨂𝜒 𝑢 𝑢 ′ =0 𝑢=0 𝑢=0 𝜔 △ 𝒛 𝜔 0 𝐲 𝑈 𝑈 , 𝐖 𝑈 , … , 𝐖 𝑛,𝑑 𝑈 , 𝐖 𝑛,𝑡 𝑏 0 𝑈 , 𝐖 𝑈 𝜔 0 𝐲 = 𝐖 0 1,𝑑 1,𝑡 𝑏 1 cos(2𝜌 𝑈 𝑢) ∞ 𝒚 𝑢 ∈ ℝ 𝐸 , 𝑏 1 sin(2𝜌 𝐖 0 = 𝑏 0 𝑈 𝑢) 𝜒 𝑢 = 𝑢=0 ∞ ⋮ 𝒚 𝑢 cos(2𝜌 𝑏 𝑛 cos(2𝜌 𝑈 𝑗𝑢) ∈ ℝ 𝐸 𝐖 𝑗,c = 𝑏 𝑗 𝑈 𝑛𝑢) 𝑢=0 ∞ 𝑏 𝑛 sin(2𝜌 𝒚 𝑢 sin(2𝜌 𝑈 𝑛𝑢) 𝑈 𝑗𝑢) ∈ ℝ 𝐸 𝐖 𝑗,𝑡 = 𝑏 𝑗 𝑢=0 5 𝑏 𝑗 : the fourier coefficients
Background Final Formulation 𝐲 , 𝐖 0 (𝐳) 𝜆 𝐲,𝐳 △ = 𝐖 0 𝑛 𝐲 , 𝐖 𝑜,𝑑 𝐲 , 𝐖 𝑜,𝑡 𝐳 𝐳 + cos 𝑜 △ 𝐖 𝑜,𝑑 + 𝐖 𝑜,𝑡 𝑜=1 𝑛 𝐲 , 𝐖 𝑜,𝑡 𝐲 , 𝐖 𝑜,𝑑 𝐳 𝐳 + sin 𝑜 △ − 𝐖 𝑜,𝑑 + 𝐖 𝑜,𝑡 𝑜=1 Similarity Score 𝑇 𝐲, 𝐳 = max 𝜆 𝐲,𝐳 △ △ 𝑢 𝑛 = arg max 𝜆 𝐲,𝐳 △ △ 6
Our Method Matching with energy 𝐹 𝜆 𝐲,𝐳 1 > 𝐹 𝜆 𝐲,𝐳 2 if 𝑇 𝐲, 𝐳 1 > 𝑇 𝐲, 𝐳 2 𝑇 𝐲, 𝒛 = 𝐹(𝜆 𝐲,𝐳 (△)) Denote the Fourier series of 𝑔(𝑦) as 𝑛 𝑛 𝑔 𝑦 = 1 2 𝑑 0 + 𝑑 𝑜 cos 𝑜𝑦 + 𝑡 𝑜 sin(𝑜𝑦) 𝑜=1 𝑜=1 The energy of 𝑔(𝑦) is ∞ 𝑔 𝑦 2 𝑒𝑦 𝐹 𝑔 𝑦 = −∞ According to the Parseval’s Theorem 𝑜 ∞ 1 2 + 𝑡 𝑗 2 + 𝑑 0 2 𝑒𝑦 = 2 2𝜌 𝑔 𝑦 𝑑 𝑗 −∞ 𝑗=1 7
Our Method Matching with energy The final form of the energy 𝑇 𝐲, 𝐳 for 𝜆 𝐲,𝐳 △ is 𝑇 𝐲, 𝐳 = 𝐹 𝜆 𝐲,𝐳 △ 𝑛 2 𝐲 , 𝐖 𝑜,𝑑 𝐲 , 𝐖 𝑜,𝑡 𝐳 𝐳 = 𝐖 𝑜,𝑑 + 𝐖 𝑜,𝑡 𝑜=1 Generalized formulation 𝑛 𝑞 2 𝑞 𝑇 𝑞 𝐲, 𝐳 = 2 + 𝑡 𝑗 𝑑 𝑗 𝑗=1 𝑛 𝑞 1 2 𝑞 = max 𝑇 ∞ 𝐲, 𝐳 = lim 2 + 𝑡 𝑗 2 + 𝑡 𝑜 2 𝑞 𝑑 𝑗 𝑑 𝑜 𝑁 𝑞→∞ 𝑜 𝑗=1 8
Our Method Matching with energy Given a query video, go through the candidate in database Calculate the 𝑇 𝐲, 𝐳 between query and candidate Retrieval with 𝑇 𝐲, 𝐳 Advantages More stable (maximum of 𝑇(𝐲, 𝐳) is sensitive to noise) Lower computational complexity Further accelerate the computation using approximate nearest neighbor method such as PQ 9
Our Method Algorithm speedup with PQ 𝑘th codebook 𝒅 𝑘∗ generated from (𝐲 𝑗 ) : 𝑗 ∈ {1, … , 𝑂} ⋃ 𝐖 (𝐲 𝑗 ) : 𝑗 ∈ {1, … , 𝑂} 𝐖 𝑘,𝑑 𝑘,𝑡 Searching steps Quantize query 𝑟 to its 𝜕 nearest neighbors with 𝑇 𝐲, 𝐳 Compute the squared distances and dot product for each subquantizer 𝑘 and each of its centroid 𝒅 𝑘𝑗 Using the subvector-to-centroid distance, calculate the similarity score 𝑇 𝐲, 𝐳 Order the candidates by decreasing 𝑇 𝐲, 𝐳 10
Experiments EVent VidEo (EVVE) dataset [CVPR’13] 620 queries, 2375 database videos, 13 events 1024-D multi-VLAD frame descriptor Experimental results 𝑞 mAP 𝑛 𝑞 2 𝑞 𝑇 𝑞 𝐲, 𝐳 = 2 + 𝑡 𝑗 𝑑 𝑗 𝑗=1 The average mAP using 𝑇 𝑞 𝐲, 𝐳 for different 𝑞 11
Experiment Results on EVVE and comparison Baseline (temporal match kernel): MM’15 MMV (mean-multiVLAD ): CVPR’13 CTE (circulant temporal encoding): CVPR’13 SHP (stable hyper- pooling): ICCV’13 12
Conclusion Propose a fast event retrieval method in video database with temporal match kernel Use the energy of the score function as similarity metric Derive the simplified energy formulation by using Parsevals’s theorem With the energy formulation, we use PQ to accelerate the computation Achieve competitive performance with the-state-of- the-art 13
Thank you!
Recommend
More recommend