CBVR = CBIR + CBAR?? - Many believe CBVR is just an extension of CBIR+CBAR Video Retrieval - In reality, Video have many additional factors, but of course, based on temporal data - temporal data induces motion to objects Joemon Jose - if CBIR is retrieve images with object ‘A’, CBVR Multimedia Information Retrieval Group is retrieve videos with ‘A’ with certain behavior. Department of Computing Science - Videos have a structured organisation, more hierarchical in nature 2/19/2008 1 2/19/2008 2 Multimedia Genres A Real life situation! • Companies have miles of video tape, • Television programs – but if you don’t know what’s on it or how to find what you need, it is almost worthless – News, sports, documentary, talk show, … • Video retrieval tools aim to sort this problem out! • Movies • Minutes after Princess Dianna’s car accident, news – Drama, comedy, mystery, … organisations were scrambling to air timely retrospectives on the royal icon. • Meeting records • Materials wasn't the problem but the amount! – Conference, video teleconference, working group • Production assistants spent hours combing through • Others video tapes searching for the right segments – Surveillance cameras, personal camcorders, … 2/19/2008 3 2/19/2008 4
Corporate Video collections! In general.. • Conference/board/staff meeting rooms are • Data size of video collections has no comparison to equipped with video cameras text databases • Formal meetings, presentations are video ‐ taped, • Retrieval from tera byte & peta byte collections is a MPEG ‐ encoded and available via company intranet – greater challenge average 3hours/week – Techniques from various disciplines • Database, information retrieval, computer vision, human • Often difficult to find video file and a portion that is computer interaction, … of interest • Storage of video in compressed formats calls for • Video is used more and more as a permanent Video is used more and more as a permanent • algorithms and techniques that deal with record record compressed domain – Hence important to find relevant passages Hence important to find relevant passages – 2/19/2008 5 2/19/2008 6 Video Structures Video Structures • Frame level: There is no (or little) temporal • Image structure analysis at this level. – Absolute positioning, relative positioning • Shot ‐ level: A shot is a set of contiguous frames all • Object motion acquired through a continuous camera recording. • Scene ‐ level: A scene is a set of contiguous shots – Translation, rotation having a common semantic significance. • Camera motion • Video ‐ level: The complete video object is treated – Pan, zoom, perspective change as a whole. • Clips – A set of frames with some meaning. Clip boundaries do not necessarily coincide with shot boundaries. A clip may correspond to several consecutive shots 2/19/2008 7 2/19/2008 8
Video Structures Semantic Level • Cut : A sharp boundary between shots. This generally • Episodes implies a peak in the difference between colour or motion – A set of shots that are characterised by a specific sequence histograms corresponding to the two frames surrounding of shot types. the cut. • For example, a news episode can be composed of the • Dissolve : The content of last images of the first shots is anchorperson’s introduction shot, the news shot, the reporter’s continuously mixed with that of the first images of the shot, and so on. second shot. • Scenes • Fade ‐ in and fade ‐ out effects are special cases of dissolve transitions where the first or the second scene, – A collection of consecutive shots that share the three respectively is a dark frame. properties of similarity in space, time and action. Scenes are related to stories and can be dynamic or static, • Wipe : The images of the second shot continuously cover or depending on whether characters move or not. push out of the display (coming from a given direction) that of the first shot. • For example, in movies, conversation scenes are static scenes. 2/19/2008 9 2/19/2008 10 Clip Clip Video Video Shot Shot Shot Shot Shot Shot Shot Shot Shot Shot Shot Shot Shot Shot Shot Shot Scene Scene Scene Scene Episode Episode 2/19/2008 11 2/19/2008 12
Video Segmentation Video Segmentation... • Breaking down video into its constituent • Alternate way elements, the shots, and their higher ‐ level – Edit decision lists created by video producers during post ‐ aggregates ‐ scenes, episodes production • This list says details of scene change, shot change etc. • Traditional Approach – Problem? • Misalignment with video stream – preview the whole video • A large part of existing video do not contain this – manually identify the segments and boundaries • Applicable only to particular type of videos! • Automatic methods – annotate them with texts – use image analysis techniques for the detection of shot – for 1 hour video it may take 10 hours. boundaries – an active research area 2/19/2008 13 2/19/2008 14 Sharp Transition detection Cut detection • Cuts can be detected by sharp change of • Cuts brightness – The cut is defined as a sharp transition between a shot and the – since two consecutive frames in a shot do not one following change significantly in their background and – It is obtained by simply joining two different shots without the insertion of any photographic effect object content, their overall brightness – Cuts generally corresponds to an abrupt change in the brightness distribution differs little pattern for two consecutive images 2/19/2008 16
Cut detection ‐ Problems Two techniques... • In the presence of Pairwise Pixel comparison differences between corresponding – continuos object motion, pixels in two consecutive frames = ∑ – camera movements − | ( ) ( ) | D I f I f – changes of illumination + 1 cut xy t xy t – the above approaches fail, because How do you extend it to colour frames? it is difficult to understand Histogram Comparison Method when the brightness changes = ∑ n − | ( , ) ( , ) | D H f j H f j + cut t t 1 = j 1 2/19/2008 17 2/19/2008 18 Gradual Transition Detection • Fades and dissolves – obtained in the laboratory through optical process – They make the boundary between frames spread across a number of frames • Fading – progressive darkening of a shot until the last frame becomes completely black (fade ‐ out), or the opposite (fade ‐ in) (black to white) 2/19/2008 19 2/19/2008 20
Dissolve Dissolve Dissolve is a superimposition of fade ‐ out and fade ‐ in 2/19/2008 21 2/19/2008 22 Shot ‐ to ‐ Shot Structure Detection Story Segmentation • Create a color histogram for each image • Video often lacks easily detected boundaries – Between programs, news stories, etc. • Segment at discontinuities (cuts) • Accurate segmentation improves utility – Cuts are easy, other transitions are also detectable – Too large hurts effectiveness, to small is unnatural • Multiple segmentation cues are available • Cluster representative histograms for each shot – Genre shift in shot ‐ to ‐ shot structure – Identifies cuts back to a prior shot – Vocabulary shift in closed captions – Intrusive on ‐ screen text • Build a time ‐ labeled transition graph – Musical segues 2/19/2008 23 2/19/2008 24
Shot Classification Key Frames? • Shot ‐ to ‐ shot structure correlates with genre • When searching • To distinguish – Reflects accepted editorial conventions collections of videos users shots/scenes from each often interested an other • Some substructures are informative overview of the document • To summarise shots – Frequent cuts to and from announcers • Key ‐ frames can be used to • To provide access points distinguish videos from – Periodic cuts between talk show participants to them each other, to summarise • Well chosen key frames – Wide ‐ narrow cuts in sports programming videos and to provide – Help video selection • Simple image features can reinforce this access points into them – More visually appealing – Head ‐ and ‐ shoulders, object size, … 2/19/2008 25 2/19/2008 26 Key frame selection – A comparison of Key Frame Extraction methods • First frame of a shot is easy to select – But it may not be the best choice • Genre ‐ specific cues may be helpful – Minimum optical flow for director’s emphasis – Face detection for interviews – Presence of on ‐ screen captions • This may produce too many frames – Color histogram clusters can reveal duplicates 2/19/2008 27 2/19/2008 28
Frame 1 Frame 39 Frame 283 Frame 302 Frame 302 Frame 19 Frame 161 Frame 292 Frame 509 Frame 571 2/19/2008 29 2/19/2008 30 Video retrieval • At the Frame level – Lighting conditions/Composition features – Perceptual properties like colour and texture – Object identification and location • At the shot level – Select a key frame from a shot • Browse & navigate 2/19/2008 31 2/19/2008 32
Recommend
More recommend