hierarchical segmentation of presentation videos through
play

Hierarchical Segmentation of Presentation Videos through Visual and - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/224680108 Hierarchical Segmentation of Presentation Videos through Visual and Text Analysis Conference Paper September 2006 DOI:


  1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/224680108 Hierarchical Segmentation of Presentation Videos through Visual and Text Analysis Conference Paper · September 2006 DOI: 10.1109/ISSPIT.2006.270818 · Source: IEEE Xplore CITATIONS READS 4 42 Some of the authors of this publication are also working on these related projects: NoSQL, new approaches for big data analytics, image understanding. View project All content following this page was uploaded by Aijuan Dong on 08 January 2015. The user has requested enhancement of the downloaded file.

  2. Hierarchical Segmentation of Presentation Videos through Visual and Text Analysis Honglin Li and Aijuan Dong Department of Computer Science, North Dakota State University Fargo, ND 58105 {honglin.li, aijuan.dong}@ndsu.edu Abstract - Presentation videos play an important role in information associating lecture speech with lecture textbook. The sharing and exchange. To effectively utilize these video assets, one association is performed by computing the similarity between of the important steps is to segment a long video stream into topic vectors obtained from lecture textbook and a sequence smaller, semantic units. In this paper, we investigate hierarchical of lecture vectors obtained from lecture speech through segmentation of presentation videos by combining visual and text spontaneous speech recognition. In another paper, a content analysis. Slide-level segmentation employs visual information and density function is proposed to segment instructional videos computes a sequence of slide-level video segments so that the [3]. The content density function draws guidance from the projected slide image of each such segment does not change. Topic- observation that topic boundaries coincide with ebb and flow level segmentation makes use of extracted slide text and generates a of the “density” of content shown in videos. Recently, Lin et sequence of topic-level video segments so that the topic of each such video segment does not change. This proposed segmentation al [4] investigate a linguistics-based approach for lecture procedure has been tested against various presentation videos and video segmentation. Multiple linguistic-based segmentation experimental results are presented and discussed. features from lecture speech, such as noun phrases and cue phrases, are extracted and explored. In spite of the successes, Keywords - hierarchical video segmentation, presentation video, most approaches described above focus on linearly visual information, text analysis, and topic words. segmenting video streams into smaller units. In our study, we noticed that a presentation usually consists of many topics, and each topic covers several slides 1. INTRODUCTION (Figure 1). This structure enables hierarchical segmentation, indexing and access. This paper focuses on hierarchical With recent advances in multimedia processing and segmentation of presentation videos. Specifically, two-level automatic presentation recording, a large number of video segmentation is investigated in our work: topic-level presentation videos are produced from conferences, lectures, and slide-level. As in most video segmentations, visual meetings, and corporate trainings. These presentation videos information alone cannot reliably detect topic change. cover a wide spectrum of topics and play an important role in Segmentation at topic-level usually bases on related text information sharing and exchange. However, due to analysis. In this paper, we study segmentation of presentation unstructured and liner features of videos, people often feel videos at topic-level through extracted slide text analysis. difficulties in locating a specific piece of information in a Segmentation at slide-level employs visual information. To presentation video. To ensure effective exploitation of these map segmentation results from slide text analysis back to video assets, efficient and flexible access mechanisms must video segmentation and achieve hierarchical segmentation, be provided. matching between extracted key frames and converted slide Research found multimedia users strongly prefer images is performed through image edge analysis. hierarchical video access. With hierarchical presentation, video content is organized at different granularity levels, which allows a user to flexibly access some video segments Presentation of his/her particular interest. In a search scenario, instead of returning a whole video that contains a lot of irrelevant information, the most relevant video segment can be returned, Topic 1 Topic n … thus increases the degree of video retrieval relevancy. To provide hierarchical video access, the first and important step is to hierarchically segment a long video stream into smaller, semantic units. Slide 1 … Slide m A variety of techniques have been proposed to segment presentation videos. Earlier work from the Cornell Lecture Figure 1. Hierarchical view of presentations Browser [1] uses a feature-based algorithm to segment a slide video stream. First, frames are clipped, filtered and The rest of the paper is organized as follows. We give an adaptively thresholded to produce binary images. Then, overview of the approach in Section 2, then discuss in detail feature differences between binary images are calculated and slide-level segmentation and topic-level segmentation in used to segment a slide video stream. Later on, Yamamoto et Section 3 and 4 respectively. Experimental results are given al. [2] propose topic segmentation of lecture videos by 1

  3. in Section 5. Section 6 concludes the paper and points out which discusses one or more slides. Within each such some future research. segment, the topic does not change. Figure 2 shows the first step in topic-level segmentation 2. OVERVIEW is text-based segmenting through Topic Words Introduction (TWI), which generates a sequence of slide blocks, each of Hierarchical segmentation of presentation videos which discusses one topic. To associate each slide block with discussed here (Figure 2) employs two types of data: slide its corresponding topic-level video segments, the temporal video streams captured by a stationary camera and relationship between a slide video stream and slides must be PowerPoint slide files. Slide-level segmentation operates on established. This is accomplished by matching slide images slide video streams, while topic-level segmentation makes converted from PowerPoint slides with key frames extracted use of extracted slide text. At the end, slide-level from slide-level video segments. Based on timing segmentation creates a sequence of slide-level video information of each slide, slide blocks can be mapped with segments. Within each such segment, the projected slide topic-level video segments, thus achieve hierarchical video image does not change; while topic-level segmentation segmentation. generates a sequence of topic-level video segments, each of In the following subsections, we discuss in detail slide- level segmentation and topic-level segmentation. PowerPoint Slides Slide video stream Digitizing and Converting Slides Extracting decompressing To Images Slide Text Frames Slide images Slide text Slide-Level Image Text-based Key frames Segmenting Matching Segmenting (TWI) Time stamps Presentation slide blocks of slides Mapping Process Slide-level boundaries Topic-level boundaries Two-level video segments Figure 2. Hierarchical segmentation of presentation videos 3. SLIDE-LEVEL SEGMENTATION not have special effects such as fading, dissolve and wipe, and most slide transitions are abrupt cuts. Slide-level segmentation divides a continuous slide video stream into video segments, each of which matches one slide. 4. TOPIC-LEVEL SEGMENTATION More formally, given a presentation video stream v and a set n slides, In our study, we observed that most presentations tend to of compute a set of video segments { } = follow a basic structure in spite of differences in contents and , ,..., VS vs vs vs , such that the projected slide image 0 1 m formats. A typical presentation, especially a conference ( ) ≤ ≤ vs i 0 i m of each video segment does not change. presentation, starts with a title slide, then an outline/overview slide, followed by a number of content slides (Figure 3). The Noticed that this definition only requires that each video outline/overview slide of a presentation summarizes major segment vsi displays the same slide, but it does not impose topics that will be covered in content slides. Based on this that two adjacent segments display different slides. Thus, observation of presentation structure, we proposed a text extra segments (false positives) are acceptable. If the segmentation algorithm — Topic Words Introduction. matching process detects the same slide is shown in two As discussed in the overview (Section 2), to map consecutive video segments, then these segments will be segmentation results of TWI back to video segmentation, combined. By allowing extra segments, it is less likely that image matching between extracted key frames and converted slide transitions go undetected. slide images is required. Therefore, in the following sub- Slide level segmentation discussed here employs local sections, we discuss topic-level segmentation as a two-step color histogram difference. We compare the local color process: Topic Words Introduction and image matching. histogram of successive frames. When the difference is large, a slide-level boundary is declared. This approach is simple, but works well for presentation videos since these videos do 2

Recommend


More recommend