7. Video databases Video data representations � Video = time-ordered sequence of correlated images ( frames ) � Video signal representations originate from TV technology; different standards in USA (NTSC) and Europe (PAL, SECAM) � 25-30 frames/sec � Interlaced presentation of even/odd rows to avoid flickering. � Frame size levels: 352 x 240, 768 x 576 (PAL), 720 x 576 (CCIR 601), 720 x 480 (NTSC), 1440 x 1152, 1920 x 1080 (HDTV) � Aspect ratios: 4:3, 16:9 (widescreen) � Color videos: Decomposition into luminance and chrominance. � Typical sampling rates for SD video: 720 samples per line for luminance, 360 samples per line for chrominance signals. MMDB-7 J. Teuhola 2012 168
Video compression � Not just coding of a sequence of images ( � Motion-JPEG), because the subsequent images are correlated (temporal redundancy ) . � Motion compensation : blocks (e.g. 8 x 8 pixels) in a frame are predicted by blocks in a previously reconstructed frame. � Compression artifacts disturbing the human eye may be different from those in still images. � Different techniques for different application areas (tv, dvd/bd, internet, videoconferencing) � Important issues: � Speed of compression/decompression � Robustness (error sensitivity) � Most of the standards are based on DCT (Discrete Cosine Transform) � Typical compression ratios from 50:1 to 100:1; the decompressed video is almost indistinguishable from the original. MMDB-7 J. Teuhola 2012 169
Standardization of video compression ISO/IEC MPEG (Moving Pictures Experts Group) � Standard includes both video and audio compression. � Started 1988; steps: � MPEG-1: Rates up to 1.5 Mbits / sec (VHS quality) � MPEG-2: Rates up to 10 Mbits / sec (Digi-TV, DVD, HDTV) � MPEG-3: Planned but dropped (found to be unnecessary) � MPEG-4: Object-based (separation from scene, animation, 3D, face modelling, interactivity, etc.) ITU-T (International Telecommunication Union): � H.261: Low bit-rates (e.g. videoconferencing) � H.262 = MPEG-2 � H.263: Low bit-rates (improved) � H.264 = MPEG 4 / Part 10, high compression power MMDB-7 J. Teuhola 2012 170
Random access from compressed video � Broadcasting or accessing video from storage: It should be possible to start from (almost) any frame. � MPEG solution: Three kinds of frames: � I-frame : Coded without temporal correlation (prediction); � gives lowest compression gain. � P-frame : Motion-compensated prediction from the last (closest) I- or P-frame. � B-frame : Bidirectional prediction from the previous and/or the next I- or P-frame; � highest compression gain � gets over sudden changes � errors do not propagate. � GOP = Group Of Pictures = smallest random-access unit, must be decodable independently (starts usually with an I-frame). MMDB-7 J. Teuhola 2012 171
Example of frame order in MPEG Bidirectional prediction I B B B P B B B P B B B I Forward prediction � Two orders of frames: � Display order � Bitstream order � Buffering is needed to convert from bitstream order into display order; a small delay is involved. � The predictor and predicted frame need not be adjacent. MMDB-7 J. Teuhola 2012 172
Organizing and querying content of a video database Questions to be answered: � Which aspects of videos are likely to be of interest? � How should these aspects be represented and stored? � What kind of query languages are suitable? � Is the content extraction process manual or automatic? Possible aspects of interest: � Animate objects (people, etc.) � Inanimate objects (houses, cars, etc.) � Activities and events (walking, driving, etc.) Properties of objects: � Frame-dependent : valid in a subset of frames. � Frame-independent : valid for the video as a whole. MMDB-7 J. Teuhola 2012 173
Query types from a video database (a)Retrieve a complete video by name (b)Find frame sequences (‘clips’; ’shots’) containing certain objects or activities . (c) Find all videos/sequences containing objects/activities with certain properties . (d)Given a frame sequence, find all objects (of a certain type) occurring in some or all of the frames of the segment. (e)Given a frame sequence, find all activities (of a certain type) occurring in it. NOTE: Video is a multimedia tool: images + audio + possible text. Audio channel can be extremely important in detecting events. Textual components (e.g. subtitles are invaluable keyword sources) MMDB-7 J. Teuhola 2012 174
Indexing of video content � Content descriptions are not usually built on a frame-by-frame basis, due to the high number of frames. � Compact representations are needed. � Concepts: � Frame sequence : A contiguous subset of frames (e.g. a ‘shot’) � Well-ordered set of frame sequences : Temporal order, no overlaps � Solid set of frame sequences : Well-ordered, non-empty gaps between sequences (‘scene’) � Frame sequence association map : For each object and activity, a solid set of frame sequences is attached, showing frames in which they appear. MMDB-7 J. Teuhola 2012 175
Frame segment tree � Binary tree � Special (1-dimensional) case of the spatial clipping approach. � Leaves represent basic intervals of the frame sequence: � Leaves are well ordered, and they cover the whole video. � Their endpoints include all endpoints of the sequences. � An internal node represents the concatenation of its children � The root represents the whole video. � Example of objects and activities: obj. 1 obj. 2 act. 1 frame no 1000 2000 3000 4000 5000 MMDB-7 J. Teuhola 2012 176
Frame segment tree: example 0- 1 0- 5000 3000- 3000 2 3 5000 0- 2000- 3000- 4000- o2 o1 4 5 6 7 a1 2000 3000 4000 5000 o1 o2 9 10 11 a1 13 o2 14 o2 15 o1 8 12 a1 a1 500- 2000- 2500- 3500- 4000- 4500- 0- 3000- 2000 2500 3000 4000 4500 5000 500 3500 Indexing: Note: Actually the intervals are � Obj. 1 → 6, 9, 15 half-open, e.g. [0, 500) = 0..499 � Obj. 2 → 4, 10, 13, 14 � Act. 1 → 7, 9, 10, 12 MMDB-7 J. Teuhola 2012 177
Indexing in the frame segment tree � For each object and activity record, there is a list of pointers to the nodes of the frame segment tree. � Objects and activities themselves may be indexed in traditional ways. � Each node of the frame segment tree points to a linked list of pointers to the objects and activities that appear throughout the whole segment that this node represents (but only partially in the parent segment). In the previous example: node 4 → obj. 2, node 6 → obj. 1 node 7 → act. 1 node 9 → obj.1, act. 1 node 10 → obj. 2, act. 1 node 12 → act. 1 node 13 → obj. 2 node 14 → act.2 node 15 → obj. 1 � This can be generalized to a set of videos (common frame segment tree, combined object/activity set, extended pointers). MMDB-7 J. Teuhola 2012 178
Queries using a frame segment tree (a)Find segments where a given object/activity occurs (trivial; just follow the pointers.) (b)Find objects occurring between frames s and e : Walk the tree in preorder, denote the current node interval by I . � If I ∩ [ s, e ) = ∅ , then this subtree can be skipped. � If I ⊆ [ s, e ), then walk through the whole subtree (including the current node) and report all its objects. � Otherwise report the objects and activities of the current node, and continue the search to both subtrees. (c) Find objects/activities occurring together with object x : Scan the segments where x occurs, and report the objects/activities occurring in these segments and their ancestors. MMDB-7 J. Teuhola 2012 179
R-segment tree (RS-tree) � Special case of R-tree � Two possible implementations: (a) 1-dimensional space (dimension = time) (b) 2-dimensional space, where the other dimension is just enumeration of objects/activities (not a true spatial dimension): R2 R1 obj. 1 obj. 2 R3 act. 1 1000 2000 3000 4000 5000 MMDB-7 J. Teuhola 2012 180
Computer-assisted video analysis Video segmentation: � Division of videos into homogeneous sequences. � Typical segments are often so called shots , filmed without interrupts � Segmentation = detection of shot boundaries � Sharp cuts are easier than gradual transitions (e.g. crossfade) � Features for automatic segmentation: � Similarity of c olor histograms of subsequent frames: simple and effective, but sensitive to varying illumination. � Edge features : similarity of shapes � Motion vectors : restricted vector lengths within a shot. � Corner points : similarity of landmark points in frames � The actual segmentation can be based on thresholds for similarity, but also machine learning techniques have been used widely. � Higher-level segmentation into scenes , called also story units. MMDB-7 J. Teuhola 2012 181
Computer-assisted video analysis (cont.) Keyframes: � Representative frames within shots, containing the essential elements for retrieval � Scene-level segmentation often uses keyframe features, and operates e.g. in top-down or bottom-up manner. Choosing keyframes: � Fuzzy task – no definite optimum � Can be based on the same features as segmentation � Various algoritmic approaches: � Sequential comparison � Clustering � Trajectory-based � Decision in the context of object/event detection MMDB-7 J. Teuhola 2012 182
Recommend
More recommend