deep learning based semantic video indexing and retrieval
play

Deep Learning Based Semantic Video Indexing and Retrieval Anna - PowerPoint PPT Presentation

Deep Learning Based Semantic Video Indexing and Retrieval Anna Podlesnaya, Sergey Podlesnyy Cinema and Photo Research Institute (NIKFI) This work was funded by Russian Federation Ministry of Culture Contract No. 2214-01-41/06-15 Fast Track


  1. Deep Learning Based Semantic Video Indexing and Retrieval Anna Podlesnaya, Sergey Podlesnyy Cinema and Photo Research Institute (NIKFI) This work was funded by Russian Federation Ministry of Culture Contract No. 2214-01-41/06-15

  2. Fast Track Contribution 1 Contribution 2 Contribution 3 Video Segmentation Video Indexing Search by Examples Feature vector extracted Graph-based database Video retrieval by sample by GoogLeNet contains for temporal, spatial and video clip @0.86 enough semantic semantic properties precision information for indexing is proposed. segmenting raw video Online learning of new into shots with 0.94 Cost efficient pipeline. concepts: video retrieval precision compared to by sample photos @0.64 MPEG-4 i-frames. precision.

  3. Relevance Archives are Huge Production Needs MPEG-7 Query Format Russian Documentary Everyday need for QueryByFreeText ● Archive: 250K items footage in TV production QueryByMedia ● (dated from 1910) ● SpatialQuery Non-fiction movies ● TemporalQuery Russian TV Archive: production relies on 100K items historical and cultural heritage content ISO/IEC 15938-5:2003 Information Youtube: users uploading technology -- Multimedia content 100hrs of video every Education, research, art... description interface -- Part 5: minute (as of 2013) Multimedia description schemes

  4. Semantic features extraction by Video deep neural network Segmentation Shots cut by vector distance spikes between frames With semantic features Temporal pooling for shot semantics summarizing

  5. Deep Neural Network

  6. Semantics Feature Vector

  7. Distance Between Frames’ Feature Vectors

  8. Segmenting Algorithm Details { x 0 , x 1 , … x n } — feature vectors of successive frames

  9. Robustness to Camera Movement Zoom Pan/Rotate Pan Pan Zoom/Pan

  10. Apache Cassandra storage for feature vectors and thumbnails Video Indexing Neo4j graph database for movies archive With graph database Structured queries for keywords-based retrieval

  11. Starting with film or Store per-scene data Add edges to Neo4j tape structure in Neo4j graph to speed up nearest neighbors search FV BK-Tree Digitizing Segmentation Indexing Extraction Building Store per-frame May use additional timecodes and feature classifiers for faces, vectors in Cassandra places, salient objects etc.

  12. Graph-Based Index

  13. Neo4j Graph

  14. Neo4j Graph

  15. Neo4j Query Find Scenes with Zebra MATCH (s:Shot) - [c:Category] -> (w:Wordnet {synset: “zebra”}) WHERE c.weight > 0.1 RETURN s ORDER BY s.duration DESC ASCII art: (s)-[c]->(w)

  16. Neo4j Query Find Scenes with Lion at Left to Zebra MATCH (s:Shot) --> (zebra_obj:Salient_obj) --> (w:Wordnet {synset: “zebra”}) MATCH (s) --> (lion_obj:Salient_obj) --> (w:Wordnet {synset: “lion”}) MATCH (zebra_obj) - [:Left] -> (lion_obj) RETURN s ORDER BY s.duration DESC (s)-->(zebra_obj)-->(w) (s)-->(lion_obj)-->(w) (zebra_obj)-[:Left]->(lion_obj)

  17. Search by Find similar clip Example Find near-duplicates Online learning of new concepts One picture is better than 100 words

  18. Use Case 1 Found clips with required characteristics Search for ELEPHANT Keyword Search Select Sample Clip Find Similar Clips Need elephants herd, forest, sky

  19. Find Similar Clip Quick Search Exhaustive Search An average precision of search by video sample was 0.86. The 31-bit random projection Feature vectors pooled precision was evaluated by by scene ( R 1024 ) hash (RPH) searching by a keyword and then searching by one of resulted shots with cosine BK-Tree on PRH Cosine distance between distance threshold 0.3. A hamming distance from sample clip and every human expert performed sample clip other scene in the true/false positives counting. archive, sort descending Quick incomplete search Well, slow

  20. Found near-duplicates, Use Case 2 robust to resampling, vignetting, hue/sat Show sample clip augmentation etc. Extract Features Exhaustive Search Sort by Cos-distance 0.0057 0.0124 0.0152 0.0583

  21. Found video clips Use Case 3 matching the classifier trained on images Show sample images feature vectors of unknown concept AP 0.64 Google Image Search Train Linear Classifier Exhaustive Search Vowpal Wabbit

  22. Future Work Thank you! ● Faces ● Places Questions welcome Video to text annotations ●

Recommend


More recommend