Dynamic language model adaptation using presentation slides for - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221489599 Dynamic language model adaptation using presentation slides for lecture speech recognition Conference Paper · January 2007 Source: DBLP CITATIONS READS 21 183 5 authors , including: Iwano Koji Koichi Shinoda Tokyo City University Tokyo Institute of Technology 90 PUBLICATIONS 985 CITATIONS 160 PUBLICATIONS 1,650 CITATIONS SEE PROFILE SEE PROFILE Sadaoki Furui Haruo Yokota Tokyo Institute of Technology Tokyo Institute of Technology 400 PUBLICATIONS 9,119 CITATIONS 188 PUBLICATIONS 866 CITATIONS SEE PROFILE SEE PROFILE All content following this page was uploaded by Koichi Shinoda on 04 June 2014. The user has requested enhancement of the downloaded file.

INTERSPEECH 2007 Dynamic Language Model Adaptation Using Presentation Slides for Lecture Speech Recognition Hiroki Yamazaki, Koji Iwano, Koichi Shinoda, Sadaoki Furui and Haruo Yokota Department of Computer Science, Tokyo Institute of Technology, Japan yamazaki@ks.cs.titech.ac.jp, { iwano, shinoda, furui, yokota } @cs.titech.ac.jp Abstract to be interrupted by questions from students. The spontaneity of this kind of speech is much higher than other kinds of pre- We propose a dynamic language model adaptation method that sentations; the lectures are characterized by strong coarticula- uses the temporal information from lecture slides for lecture tion effects, non-grammatical constructions, hesitations, repeti- speech recognition. The proposed method consists of two steps. tions, and filled pauses. For these reasons, speech recognition First, the language model is adapted with the text information for classroom lecture speech is generally more difficult than that extracted from all the slides of a given lecture. Next, the text of speeches in conferences or meetings; its recognition accuracy information of a given slide is extracted based on temporal in- is around 40-60%. Furthermore, no large database of classroom formation and used for local adaptation. Hence, the language lecture speech is available for training acoustic and language model, used to recognize speech associated with the given slide models. changes dynamically from one slide to the next. We evaluated In classrooms, lecturers often use various materials, e.g., the proposed method with the speech data from four Japanese textbooks or slides, to help their students understand. Since lecture courses. Our experiments show the effectiveness of our those materials include many keywords that also appear in lec- proposed method, especially for keyword detection. The F- ture speech, they are expected to be useful for language model- measure error rate for lecture keywords was reduced by 2.4%. ing in speech recognition. Several adaptation methods for lan- Index Terms : language model adaptation, speech recognition, guage models using such content have already been proposed classroom lecture speech. for lecture speech recognition. For example, Togashi et. al. [14] proposed a method of using the text information in presentation 1. Introduction slides. If lecture speech is accompanied by slides, a strong corre- Recent advancements in computer and storage technology en- lation can be observed between slides and speech. In partic- able archiving large multimedia databases. The databases of ular, the speech corresponding to a given slide contains most classroom lectures in universities and colleges are particularly of the text information presented in the slide. We expect this useful knowledge resources, and they are expected to be used in relation between speech and text information of the slide can education systems. improve the model adaptation for lecture speech recognition. Recently much effort has been made to construct educa- We propose a dynamic adaptation method for language model- tional systems that use the multimedia content of classroom ing that applies text information from slides. In this method, a lectures to support distant-learning [1, 2, 3, 4, 5]. Among the slide-dependent language model is constructed for each slide, various kinds of content related to lectures, the transcription and this model is used afterwards to recognize the speech as- of speech data is expected to be the most important for in- sociated with the given slide. The language model is changed dexing and searching lecture contents [2, 6]. Therefore, high- dynamically as the lecture progresses. level speech recognition engine for lectures is required. Lecture This paper is organized as follows. In Section 2, the base speech recognition has been studied extensively. Many research system applied in our studies is introduced. In Section 3, the projects for lecture transcriptions, such as the European project proposed language model adaptation method is explained, and CHIL (Computers in the Human Interaction Loop) [8], and the in Section 4, the effectiveness of the proposed method is dis- American iCampus Spoken Lecture Processing project [9], have cussed. been conducted. Trancoso et. al. [7] investigated the automatic transcription of classroom lectures in Portuguese. 2. UPRISE: Unified Presentation Contents Large databases of conference presentations, such as the Corpus of Spontaneous Japanese (CSJ) [10, 11], and the TED Retrieval by Impression Search Engine corpus [12] have been collected to improve speech recognition accuracy. With the use of these databases, a state-of- UPRISE (Unified Presentation Contents Retrieval by Impres- the-art speech recognition systems for conference presentations sion Search Engine) [1, 15] is a lecture presentation system achieves accuracy of 70-80%. Hence, the recognition results to support distant-learning. It stores many types of multime- provided by these systems are good enough to be used for dia materials, such as texts, pictures, graphs, images, sounds, speech summarization and speech indexing [13]. The speak- voices, and videos, and provides a unified presentation view ing style of classroom lectures is, however, much different from (Figure 1) as a lecture video retrieval system. The retrieval that of lectures in meetings or conferences. Classroom lectures system returns appropriate lecture video scenes to match given are not always practiced in advance, and the same phrases are keywords. Since the speech information in lectures is used to repeated many times for emphasis. The lecture speaking style narrow down the search candidates [6], a high level of speech is closer to that in dialogue because lecturers are always ready recognition accuracy is strongly required. 2349 August 27 - 31, Antwerp, Belgium

Dynamic language model adaptation using presentation slides for - PDF document

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/221489599 Dynamic language model adaptation using presentation slides for lecture speech recognition Conference Paper January 2007

Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Dynamic Adaptation Minema Minema

Coastal Adaptation Kellie Fisher FCERM Senior Advisor Why Adaptation? Adaptation to a

Language Model Adaptation Hsin-min Wang References: X. Huang et. al., Spoken Language

Adaptation Philipp Koehn 27 October 2020 Philipp Koehn Machine Translation: Adaptation 27

Adaptation Techniques for Acoustic Adaptation Techniques for Acoustic Adaptation Techniques for

The solar wind Gas dynamic model I 9/28/2016 2 Gas dynamic model II 9/28/2016 3 Gas dynamic

A New Adaptation Method for Speaker- -Model Model A New Adaptation Method for Speaker Creation

COMMUNICATING [with empathy] @ DY DYNAMIC JILL JILL @ DY DYNAMIC JILL TENSION IS INEVITABLE @

Innovative Climate Financing for Adaptation Mainstreaming Adaptation Financing in Development

Climate Adaptation Intro and Workshop Overview Paul Moss MPCA Adaptation/Mitigation

IUCN Ecosystem based approaches to adaptation and risk reduction and risk reduction 1. What is

Biodiversity, Ecosystem Services and Adaptation and Adaptation Dr Pushpam Kumar Associate

Action 1. Encourage MS to adopt Adaptation Strategies and action plans Action 2. LIFE funding,

Korea's Experiences on Adaptation Planning Ju Youn KANG Korea Adaptation Center for Climate

Adaptation in polygenic traits Criteria for sweeps and shifts Joachim Hermisson Mathematics

Climate Adaptation Planning for the Town of Truckee GEOS INSTITUTE Whole Community Adaptation

F Y2017 2023 RE COMME NDE D CIP Bo ard o f Co mmissio ne rs Wo rk Se ssio n Ma y 9, 2016

Information Technology Department FY 2017/2018 MOE BUDGET PRESENTATION TIM DUPUIS, CIO/REGISTRAR

Spam Detection in Voice-over-IP Calls through Semi-Supervised Clustering Yu Sung Wu Saurabh Bagchi

BCPS FY 2019-20 Open Enrollment Communication Conducting these information meetings.

New technologies and inclusion E-commerce Video conferencing Some consumers Voice

What is it about mobile that enhances an experience? DEVICE PORTABILITY? RESPONSIVE

CS 753 : Automatic Speech Recognition Project Voice Conversion using GANs Varun Bhatt

Simplifying EMR Interactions For Your Clinicians A Complimentary Webinar From healthsystemCIO.com