Center for Digital Video Processing C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g CDVP & TRECVID-2003 News Story Segmentation Task Csaba Czirjek, Gareth J.F. Jones, Seán Marlow, Noel Murphy, Noel E. O’Connor, Neil O’Hare, Alan F. Smeaton TREC-2003 (Neil O’Hare) - 1 -
Center for Digital Video Processing Contents C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • Introduction – Structure of News Broadcast – System Overview • Story Segmentation System – Feature Extraction Process – Combination of Features using Support Vector Machine – Submitted Runs • Results • Conclusions TREC-2003 (Neil O’Hare) - 2 -
Center for Digital Video Processing Structure of a News Broadcast C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • We assume stories are delimited by shots of the anchorperson • Features of Anchor shots : – All anchor shots within a broadcast taken from the same camera setup – filmed with a static camera, with little object motion – anchor shots in a single broadcast are visually similar to each other TREC-2003 (Neil O’Hare) - 3 -
Center for Digital Video Processing Structure of a News Broadcast C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g Commercial Break Anchorperson Shots News Report Shots TREC-2003 (Neil O’Hare) - 4 -
Center for Digital Video Processing System Overview C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • We use TRECVID 2003 common shot boundary provided by CLIPS-IMAG • Extracted features combined to detect anchor shots • Story boundaries logged at the start of anchor shots • Aim is to extract features that are robust to changes across broadcasters (eg faces, motion, shot length) • This would give a generic news segmentation system TREC-2003 (Neil O’Hare) - 5 -
Center for Digital Video Processing System Overview C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g News Stories Shot Level News Story Feature Extraction Detection 1 Shot Clustering 2 30 Minute News Program 3 Face Detection Shot Boundary Detection 4 Motion Activity Analysis x 2 5 Support Vector Machine Shot Length 6 Donated by CLIPS-IMAG 7 8 Text Segmentation Donated by StreamSage TREC-2003 (Neil O’Hare) - 6 -
Center for Digital Video Processing Feature Extraction 1 - Shot Clustering C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • Shots are clustered based on visual similarity (colour histogram) • anchor shots grouped together • anchor clusters identified using heuristics: – tend to be dispersed throughout the broadcast – average length longer than others – anchor shots are very similar to each other: they form ‘tighter’ clusters TREC-2003 (Neil O’Hare) - 7 -
Center for Digital Video Processing Feature Extraction 2 - Face Detection C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • Coarse to fine approach to extract candidate regions: – Skin like pixels identified based on colour – Morphological filtering used to obtain smoothed areas of connected pixels – Shape and size heuristics remove candidate face regions • Candidates passed to a Principle Component Analysis (PCA) module for final classification • Every 12th frame (I-frames) used for processing TREC-2003 (Neil O’Hare) - 8 -
Center for Digital Video Processing Face Detection C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g Original video file Face Database skin filtering + morphological adjustment size/shape heuristics PCA 0. 7 0. 8 0. 5 0. 2 For every 12 th frame Filtered image after Image after applying Detected faces with morphological adjustment size/shape heuristics confidence score TREC-2003 (Neil O’Hare) - 9 -
Center for Digital Video Processing Feature Extraction 3 - Activity Measure C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • Motion Activity analysis based on MPEG-1 motion vectors • Every P-frame is analysed • We count the number of zero length motion vectors in a P-frame (excluding I-blocks) • Activity measure: No. of zero length vectors Total No. of macroblocks TREC-2003 (Neil O’Hare) - 10 -
Center for Digital Video Processing Feature Extraction 3 - Activity Measure C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • Two separate shot level measures used: – least active P-frame is used to represent the shot – All motion vectors across a shot are added to form a cumulative motion vector. Activity measure then calculated using cumulative motion vector cumulative frame: frame a frame b frame a + frame b 0,0 1,1 -5,9 0,-1 0,1 -3,5 0,1 1,0 -2,4 0,0 0,0 4,3 + 3,0 0,0 0,0 = 3,0 0,0 4,3 -2,1 1,-1 1,0 -2,1 0,1 0,1 -4,2 1,0 1,1 TREC-2003 (Neil O’Hare) - 11 -
Center for Digital Video Processing Feature Extraction 4 - Shot Length C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • Shot length used as a feature • Measured in frames TREC-2003 (Neil O’Hare) - 12 -
Center for Digital Video Processing Feature Extraction 5 - Text Analysis C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • To allow us to complete the required runs, we used text analysis provided by StreamSage • StreamSage text output used as binary feature TREC-2003 (Neil O’Hare) - 13 -
Center for Digital Video Processing Combination of Features - SVM C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • Extracted features combined using Support Vector Machine • Trained on 10 hours of the TRECVID 2003 development set (5 CNN, 5 ABC) • Resulting SVM classifier detects anchor shots • Story boundaries are logged at the beginning of anchor shots TREC-2003 (Neil O’Hare) - 14 -
Center for Digital Video Processing Submitted Runs C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g • 3 Required Runs – A/V only system - generic system for ABC and CNN ( DCU03_REQ_AV ) – A/V + text - generic system for ABC and CNN ( DCU03_REQ_AV_TEXT) – Text only - text Analysis provided by StreamSage ( DCU03_REQ_TEXT_ONLY) • 2 Additional Optional Runs – Specialised systems for ABC and CNN. Separate SVMs for each broadcaster ( DCU03_OPT_AV ) – Clustering algorithm in isolation ( DCU03_OPT_CLUSTER) TREC-2003 (Neil O’Hare) - 15 -
Center for Digital Video Processing DCU Results C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g System ID Recall Precision DCU03_REQ_AV 0.328 0.409 DCU03_REQ_AV_TEXT 0.294 0.453 DCU03_REQ_TEXT_ONLY 0.049 0.208 DCU03_OPT_AV 0.313 0.453 DCU03_OPT_CLUSTER 0.364 0.304 TREC-2003 (Neil O’Hare) - 16 -
Center for Digital Video Processing Overall Results - All Groups C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g 1 0.9 0.8 0.7 DCU Fudan 0.6 IBM Precision kddi 0.5 NUS StreamSage 0.4 UCF Iowa 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall TREC-2003 (Neil O’Hare) - 17 -
Recommend
More recommend