trecvid story segmentation based on content independent
play

TRECVID Story Segmentation based on Content-Independent Audio-Video - PowerPoint PPT Presentation

2004 TRECVID Workshop TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi, Masaru Sugano, Masaki Naito, Kazunori Matsumoto, Fumiaki Sugaya, Yasuyuki Nakajima KDDI R&D Laboratories, Inc. KDDI


  1. 2004 TRECVID Workshop TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi, Masaru Sugano, Masaki Naito, Kazunori Matsumoto, Fumiaki Sugaya, Yasuyuki Nakajima KDDI R&D Laboratories, Inc. KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 1 (Nov 15, 2004)

  2. Outline � Introduction � System description � Baseline story segmentation method � SVM-based segmentation w/ low-level features � System components: � Section-specific segmentation � Anchor shot segmentation � Post-filtering � Experiment results � Conclusion KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 2 (Nov 15, 2004)

  3. Introduction � Motivation � Development of a generic story segmentation algorithm applicable to non-news video contents � Requirements � Utilize only low-level audio-video features which can be extracted from any video data � Restricted use of news-specific features (e.g., anchor shots) � Restricted use of text information (e.g., ASR results) Main focus: Story segmentation based on “Audio+Video” experiment condition KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 3 (Nov 15, 2004)

  4. Introduction (cont’d) � However, content-specific features are necessary to achieve accurate segmentation Content-specific components developed to complement weak points of baseline method � Highly accurate story segmentation achieved! KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 4 (Nov 15, 2004)

  5. Overview: Experiment results 1.0 Recall Precision F-Measure 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 kddi_ss_all1_pfil kddi_ss_all1nsp07_pfil kddi_ss_all1 kddi_ss_c+k1 kddi_ss_all2nsp07_pfil kddi_ss_base A-1 A-2 B-1 B-2 B-3 E-1 kddi_ss_all2_pfil C-1 C-2 C-3 D-1 D-2 Figure 1. Recall, precision and F-measure of all “Audio+Video” TRECVID submissions Outperformed all non-KDDI runs! KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 5 (Nov 15, 2004)

  6. System Description KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 6 (Nov 15, 2004)

  7. System outline Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 7 (Nov 15, 2004)

  8. “Baseline” component Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 8 (Nov 15, 2004)

  9. Baseline story segmentation � Procedures: Input � Shot segmentation video � Merged TRECVID common shot boundaries with shot segmentation results of IBM VideoAnnEx tool shot segmentation � Applied “curtain-type” wipe detection method � Feature extraction feature extraction � Extracts low-level audio-video features from each shot, and generates “shot vectors” � SVM-based story segmentation SVM-based story segmentation � Discriminates shots which contain story boundaries KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 9 (Nov 15, 2004)

  10. Extracted audio-video features � Audio � Color � Average RMS � Color layout of first, middle, and last frame (6*Y, 3*Cb, � Avg RMS of first n frames 3*Cr) � Frequency of audio class � Color layout distance (silence, speech, music, between first, middle and noise) last frames � Details in Reference [4] � Temporal � Motion � Shot duration � Horizontal motion � Shot density � Vertical motion � Total motion Total number of elements: 51 � Motion intensity 51-dimensional “shot vector” KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 10 (Nov 15, 2004)

  11. SVM-based story segmentation � Apply SVM to discriminate shots w/ story boundary � Training phase � Shots which contain story boundary ⇒ Positive � All other shots ⇒ Negative t Story boundary Story boundary Story boundary � Evaluation phase � Extract N shots based on distance from SVM hyperplane � N = Average number of stories in ABC, CNN (Baseline) � N = Average number of stories x 1.5 (Extended baseline) � Set story boundary at beginning of each extracted shot KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 11 (Nov 15, 2004)

  12. Problems of baseline method � Although baseline results were satisfactory, several weak points were observed… � Poor recall in various “sections” � e.g., Top Stories , Headline Sports of CNN � Cause: Different characteristics compared to general content � No anchor shots, background music, etc. � SVM unable to adapt to various features � Impossible to detect multiple story boundaries that occur within a single shot � Baseline can only set one story boundary per shot KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 12 (Nov 15, 2004)

  13. Additional system components � Section-specialized segmentation � Objective: � Improvement of recall in specific sections which have different characteristics � Anchor shot segmentation � Objective: � Detection of multiple story boundaries which occur within a single shot � Post-filter � Objective: � Improvement of precision KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 13 (Nov 15, 2004)

  14. Component 1: Section-specialized segmentation Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 14 (Nov 15, 2004)

  15. Section-specialized segmentation � General approach: � Construct SVM specialized for story segmentation within specified sections � Procedures: � Section extraction � Extraction based on “jingles”, i.e., audio- section extraction video sequences which initiate sections � Section-specialized SVM � Construct SVM specialized to conduct story section-specialized SVM segmentation on extracted sections KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 15 (Nov 15, 2004)

  16. Section extraction � Automatic detection of “jingles” based on reference audio signals � Based on “Time-series active search” algorithm [Kashino] � Extract sections based on position of extracted jingles Top Stories Headline Sports t Start: Top Stories Start: Dollars and Sense Start: Headline Sports End: Headline Sports � Apply section-specialized SVM to set story boundaries within each extracted section KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 16 (Nov 15, 2004)

  17. Component 2: Anchor shot segmentation Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 17 (Nov 15, 2004)

  18. Anchor shot segmentation � General approach: � Extract shots which are expected to contain multiple stories (anchor shots), and insert additional boundaries � Procedures: anchor shot � Anchor shot extraction extraction � Construct SVM to discriminate anchor shots based on audio-video features anchor shot � Extraction of “silent sections” segmentation based on “silence” � Two methods: • Audio classification results • HMM-based non-speech detector story boundary � Story boundary addition addition � Insert story boundaries at detected silence sections KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 18 (Nov 15, 2004)

  19. Component 3: Post-filter Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 19 (Nov 15, 2004)

Recommend


More recommend