They Are Not Equally Reliable: Semantic Event Search using Differentiated Concept Classifiers Inkyu An
Content 1. Motivation 2. Previous paper 3. Goal 4. Related Work 5. Approach 6. Result 2
They Are Not Equally Reliable: Semantic Event Search using Differentiated Concept Classifiers MOTIVATION 3
Motivation | Semantic image retrieval <Query> <Query sentence> Person interacting with panda Is it better to use meaning of sentence? 4
They Are Not Equally Reliable: Semantic Event Search using Differentiated Concept Classifiers PREVIOUS PAPER 5
Previous paper | Semantic image retrieval Query image Result images Person holding Person interacting Person feeding Person feeding animals with panda panda calf Implied-by Mutual-exclusive Type-of 6
Previous paper | Semantic image retrieval - Extract the image features and word vectors Extracting image features (CNN) CNN feature Word2Vector Image (Skip-grams) [Girls doing Word handstand] Vector Sentence Query 7
Previous paper | Semantic image retrieval Similar Nonsimilar [Girl [Girl doing dancing on cartwheel] beach] CNN Word CNN Word Query feature Vector feature Vector CNN feature System Word [Girls doing Measure scores of Mutually Vector handstand] exclusive, Implied-by and Type-of Training … 8
Previous paper | Semantic image retrieval 9
Previous paper | Semantic image retrieval ● There are scalability & Time-consuming issue 𝟑 ● 𝒅 = 𝒅 𝒃𝒅 + 𝜷 𝒔 𝑫 𝒔𝒇𝒅 + 𝜷 𝒐 𝑫 𝒐𝒎𝒒 + 𝜷 𝒅 𝑫 𝒅𝒑𝒐𝒕 + 𝝁 𝑿 𝟑 𝐷𝑂𝑂 𝑔𝑓𝑏𝑢𝑣𝑠𝑓𝑡 𝑒𝑗𝑛𝑓𝑜𝑡𝑗𝑝𝑜 ∶ 4096 𝐵𝑑𝑢𝑗𝑝𝑜𝑡 𝑒𝑗𝑛𝑓𝑜𝑡𝑗𝑝𝑜 ∶ 27425, 𝑠𝑓𝑚𝑏𝑢𝑓𝑒 𝑗𝑛𝑏𝑓𝑡 ∶ 100 𝐹𝑛𝑐𝑓𝑒𝑒𝑗𝑜 𝑒𝑗𝑛𝑓𝑜𝑡𝑗𝑝𝑜 ∶ 𝑜(64) Especially Those issues could be fatal in video search algorithms 10
They Are Not Equally Reliable: Semantic Event Search using Differentiated Concept Classifiers GOAL 11
Goal | Semantic event search from videos Input sentence : Test videos “Horse Riding Competition” without video Video Search System Result videos 98,000 videos 12
Goal | Semantic event search from videos < Main Contribution > 1) Unsupervised Learning 2) Solve the scalability issue 3) Faster than other method 4) Differentiated Concept Classifiers 13
They Are Not Equally Reliable: Semantic Event Search using Differentiated Concept Classifiers APPROACH 14
Related Work | 1. Skip-grams - Weight vectors of actions(Sentence) 2. Spectral meta-learning - Unsupervised Learning Method 15
Approach | Proposed framework Detected Videos 16
Approach | Proposed framework Relevance Vector [Binary Vector] Warped Spectral meta-learning [Unsupervised & Fast conversion] Detected Videos 17
Approach | Unsupervised Learning - Because this is unsupervised learning, We don’t know the test video is “Horse riding competition” or not. “Horse riding competition” Test Videos, 𝑤 true or false? 1 ??? 2 ??? 3 ??? 𝑤 ??? 18
Approach | Unsupervised Learning - Because this is unsupervised learning, We don’t know the test video is “Horse riding competition” or not. “Horse riding competition” Test Videos, 𝑤 true or false? 1 true 2 Word2Vector true Proposed 3 false System 𝑤 true 19
Approach | Word to Vector - Apply Skip-Gram method to both the event and concepts Event Concept Vocabulary (total m) Description “Horse riding Blowing Horse Bee Biking competition” Candle Riding 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑠𝑗𝑒𝑗𝑜 𝑆𝑗𝑒𝑗𝑜 𝐷𝑝𝑛𝑞𝑓𝑢𝑗𝑗𝑝𝑜 Word 𝑊 Skip-Gram Model 𝐶𝑗𝑙𝑗𝑜 Vector 𝑊 𝑊 𝐶𝑚𝑝𝑥𝑗𝑜 𝐶𝑓𝑓 𝑊 𝑊 𝐷𝑏𝑜𝑒𝑚𝑓 𝑊 𝐶𝑗𝑙𝑗𝑜 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑠𝑗𝑒𝑗𝑜 𝐶𝑚𝑝𝑥𝑗𝑜 𝐶𝑓𝑓 𝑆𝑗𝑒𝑗𝑜 𝐷𝑝𝑛𝑞𝑓𝑢𝑗𝑗𝑝𝑜 𝐷𝑏𝑜𝑒𝑚𝑓 20
Approach | Relevance Score Vector - Compute distances between the event and concepts and make Relevance Vectors - Relevance Vector means how the event is similar with concepts Concept Vocabulary(total m) 𝑊 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑠𝑗𝑒𝑗𝑜 𝑄𝑓𝑝𝑞𝑚𝑓 𝐷𝑝𝑛𝑞𝑓𝑢𝑗𝑗𝑝𝑜 𝑊 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑊 𝑇ℎ𝑝𝑥 𝑊 𝑄𝑓𝑝𝑞𝑚𝑓 𝐺𝑗𝑓𝑚𝑒 𝑘𝑣𝑛𝑞𝑗𝑜 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑊 𝑇ℎ𝑝𝑥 𝑘𝑣𝑛𝑞𝑗𝑜 𝑊 𝐼𝑝𝑠𝑡𝑓 𝑠𝑗𝑒𝑗𝑜 𝐷𝑝𝑛𝑞𝑓𝑢𝑗𝑗𝑝𝑜 Compute distance Event 𝑊 𝐺𝑗𝑓𝑚𝑒 Description Relevance 0.8726 0.7647 0.7256 0.0624 Binary vector Score Vector Too High Too Low Relevance 1 1 Vector “w” 21
Approach | Proposed framework Relevance Vector [Binary Vector] Waped Spectral meta-learning [Unsupervised & Fast conversion] Detected Videos 22
Approach | Differentiated Concept Classifier - Differentiated Concept Classifier measures the similarity between the test video and concepts If the 1 st video is similar with concept 1, 𝑇 1,1 is 1. If the 1 st video isn’t similar with concept 1, 𝑇 1,1 is -1. Concepts, 𝑛 Test Videos, 𝑤 1 𝑇 1,1 , 𝑇 1,2 , … , 𝑇 1,𝑛 1 2 2 𝑇 2,1 , 𝑇 2,2 , … , 𝑇 2,𝑛 3 𝑻 𝒋,𝒌 ∈ −𝟐, 𝟐 3 𝑇 3,1 , 𝑇 3,2 , … , 𝑇 3,𝑛 𝑛 Compute 𝑤 Similarity 𝑇 𝑤,1 , 𝑇 𝑤,2 , … , 𝑇 𝑤,𝑛 23
Approach | Spectral meta-learning - Because this is unsupervised learning, “Horse riding competition” Test Videos, 𝑤 true or false? 1 ??? 2 ??? 3 ??? 𝑤 ??? 𝒏 𝑻 𝒋 𝒘 𝒛 ∗ = 𝒕𝒋𝒉𝒐 𝒋=𝟐 𝟑𝝆 𝒋 − 𝟐 𝝆 𝒋 ∶ Accuracy of the 𝒏 𝑻 𝒋 𝒘 𝒗 𝒋 i ′ th concept classifier ≈ 𝒕𝒋𝒉𝒐 𝒋=𝟐 Estimate the eigenvector 𝑣 𝑗 of concept classifier’s 24 covariance matrix to find the optimal solution
Approach | Generalized Conditional Gradient (GCG) - Because this is unsupervised learning, - to find a eigenvector 𝑣 𝑗 of covariance matrix, They used Generalized Conditional Gradient(GCG) algorithm. - GCG algorithm can be converged quickly. Repeat until convergence … { - Update the eigenvector 𝑣 𝑣 ← 𝑚𝑓𝑏𝑒𝑗𝑜 𝑓𝑗𝑓𝑜𝑤𝑓𝑑𝑢𝑝𝑠 𝑝𝑔 − 𝐻 - Local minimizer 2 𝑣𝑣 𝑈 𝑗,𝑘 − + 𝜇 𝑣 2 min 𝑉 𝑅 𝑗,𝑘 𝑗≠𝑘 } - Rank Test videos using below equation 𝑛 𝑇 𝑗 𝑤 𝑣 𝑗 𝑧 ∗ ≈ 𝑡𝑗𝑜 𝑗=1 25
They Are Not Equally Reliable: Semantic Event Search using Differentiated Concept Classifiers RESULT 26
Result | Speed comparison on synthetic data - It is Faster than previous works 27
Result | Mean average precision result 𝑈ℎ𝑓 𝑜𝑣𝑛𝑐𝑓𝑠 𝑝𝑔 𝑑𝑝𝑜𝑑𝑓𝑞𝑢𝑡 ∶ 3,135 (𝑛) 28
Summary | ● Solve scalability & time-consuming issues on unsupervised learning. ● They used Skip-grams to convert a word to a vector. ● They used Spectral-meta learning method to solve the unsupervised problem. ● They used Generalized Conditional Gradient (GCG) algorithm to improve the calculation speed. 29
Q & A | ● Thank you. 30
They Are Not Equally Reliable: Semantic Event Search using Differentiated Concept Classifiers APPENDIX 31
Appendix | Spectral meta-learning 𝑁 Pr 𝑇 𝑗 𝑂 |𝑧 - The accuracy of the 𝑗 -th 𝑗=1 concept classifier at 𝑤 video = ℒ 𝑇 1 𝑤 , … , 𝑇 𝑁 𝑤 ; 𝑧 𝑞 𝑗 = Pr 𝑇 𝑗 𝑤 = 1 | 𝑧 = 1 𝑧 ∗ = argmax y ℒ 𝑇 1 𝑤 , … , 𝑇 𝑁 𝑤 ; 𝑧 𝑜 𝑗 = Pr 𝑇 𝑗 𝑤 = −1 | 𝑧 = −1 Find a maximum 𝑧 point 𝜌 𝑗 = 𝑞 𝑗 + 𝑜 𝑗 of likelihood 𝑛 𝑇 𝑗 𝑤 𝑧 ∗ = 𝑡𝑗𝑜 𝑗=1 2 2𝜌 𝑗 − 1 𝑛 𝑇 𝑗 𝑤 𝑣 𝑗 ≈ 𝑡𝑗𝑜 𝑗=1 Estimate the eigenvector 𝑣 𝑗 by finding the optimal solution rather than 𝜌 𝑗 Because 𝑣 𝑗 ∝ 2𝜌 𝑗 − 1 32
Appendix | Spectral meta-learning - Covariance matrix 𝑅 between concept i, and concept j at video v 2 , 𝑗 = 𝑘 1 − 𝜈 𝑗 𝑅 𝑗,𝑘 = 𝐹 𝑤 [ 𝑇 𝑗 𝑤 − 𝜈 𝑗 (𝑇 𝑘 𝑤 − 𝜈 𝑘 )] = 1 − 𝑐 2 , 𝑗 ≠ 𝑘 2𝜌 𝑗 − 1 2𝜌 𝑘 − 1 - mean prediction 𝜈 of concept i 𝜈 𝑗 = 𝐹 𝑤 [𝑇 𝑗 𝑤 ] 2 𝑆𝑏𝑜𝑙 𝑝𝑜𝑓 𝑛𝑏𝑢𝑠𝑗𝑦 𝑺 = 𝝁𝒗𝒗 𝑼 𝑆≥0, 𝑠𝑏𝑜𝑙 𝑆 =1 min 𝑅 𝑗,𝑘 − 𝑆 𝑗,𝑘 𝝁: 𝑓𝑗𝑓𝑜𝑤𝑏𝑚𝑣𝑓, 𝒗: 𝑓𝑗𝑓𝑜𝑤𝑓𝑑𝑢𝑝𝑠 𝑗≠𝑘 Ranking and combining multiple predictors without labeled data 33 [PNAS, 2014]
Recommend
More recommend