KALAKA: A TV Broadcast Speech Database for the Evaluation of - PowerPoint PPT Presentation

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems Luis J. Rodr´ ıguez-Fuentes, Mikel Penagarikano, Germ´ an Bordel, Amparo Varona, Mireia D´ ıez Software Technologies Working Group (http://gtts.ehu.es) Department of Electricity and Electronics, University of the Basque Country Barrio Sarriena s/n, 48940 Leioa, Spain email: luisjavier.rodriguez@ehu.es LREC 2010, La Valletta, Malta May 20, 2010 Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work Contents 1 Introduction Motivation Database features (in brief) 2 Design issues 3 Recording setup 4 Creating the database Classification of recordings Selection of speech segments Automatic extraction of 30-, 10- and 3-second segments 5 Using the database The Albayzin 2008 LRE Developing language recognition technology 6 Conclusions and future work Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Motivation To support the Albayzin 2008 Language Recognition Evaluation, organized by the Spanish Network on Speech Technologies, from May to November 2008. Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Motivation To support the Albayzin 2008 Language Recognition Evaluation, organized by the Spanish Network on Speech Technologies, from May to November 2008. To solve the lack of a multilingual speech database specifically designed for language recognition applications featuring the official languages in Spain as target languages. Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Motivation To support the Albayzin 2008 Language Recognition Evaluation, organized by the Spanish Network on Speech Technologies, from May to November 2008. To solve the lack of a multilingual speech database specifically designed for language recognition applications featuring the official languages in Spain as target languages. To build a language recognition module for the backend of an audio indexing and retrieval system dealing with wide-band broadcast news in Spanish and Basque. Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Motivation To support the Albayzin 2008 Language Recognition Evaluation, organized by the Spanish Network on Speech Technologies, from May to November 2008. To solve the lack of a multilingual speech database specifically designed for language recognition applications featuring the official languages in Spain as target languages. To build a language recognition module for the backend of an audio indexing and retrieval system dealing with wide-band broadcast news in Spanish and Basque. To measure the accuracy that state-of-the-art language recognition systems can attain for the task of recognizing four target languages that have evolved (and continue evolving) in close contact each other. Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Motivation To support the Albayzin 2008 Language Recognition Evaluation, organized by the Spanish Network on Speech Technologies, from May to November 2008. To solve the lack of a multilingual speech database specifically designed for language recognition applications featuring the official languages in Spain as target languages. To build a language recognition module for the backend of an audio indexing and retrieval system dealing with wide-band broadcast news in Spanish and Basque. To measure the accuracy that state-of-the-art language recognition systems can attain for the task of recognizing four target languages that have evolved (and continue evolving) in close contact each other. May this task be more challenging than expected? Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Database features (in brief) Four target languages: Spanish, Catalan, Basque and Galician. Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Database features (in brief) Four target languages: Spanish, Catalan, Basque and Galician. Other (european) languages (to allow open-set tests): French, Portuguese, German and English. Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Database features (in brief) Four target languages: Spanish, Catalan, Basque and Galician. Other (european) languages (to allow open-set tests): French, Portuguese, German and English. Speech signals extracted from TV shows, including both planned and spontaneous speech in diverse environment conditions involving a varying number of speakers. Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Motivation Creating the database Database features (in brief) Using the database Conclusions and future work Database features (in brief) Four target languages: Spanish, Catalan, Basque and Galician. Other (european) languages (to allow open-set tests): French, Portuguese, German and English. Speech signals extracted from TV shows, including both planned and spontaneous speech in diverse environment conditions involving a varying number of speakers. Size: around 50 hours (3 DVD) Train dataset: 36 hours (9 hours per target language) Development dataset: 7,7 hours (90 minutes per target language + 90 minutes of other languages all together) Evaluation dataset: 7,7 hours (90 minutes per target language + 90 minutes of other languages all together) Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work Design issues Basic design criteria: 1 Regarding recording setup (devices, connectors, audio conversions, etc.): the same for all the languages 2 Regarding other sources of variability (environment, speaker, etc.): as much diversity as possible Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work Design issues Basic design criteria: 1 Regarding recording setup (devices, connectors, audio conversions, etc.): the same for all the languages 2 Regarding other sources of variability (environment, speaker, etc.): as much diversity as possible Cable TV: easy access to audio in different languages Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work Design issues Basic design criteria: 1 Regarding recording setup (devices, connectors, audio conversions, etc.): the same for all the languages 2 Regarding other sources of variability (environment, speaker, etc.): as much diversity as possible Cable TV: easy access to audio in different languages Disjoint subsets of TV shows assigned to train, development and evaluation Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work Design issues Basic design criteria: 1 Regarding recording setup (devices, connectors, audio conversions, etc.): the same for all the languages 2 Regarding other sources of variability (environment, speaker, etc.): as much diversity as possible Cable TV: easy access to audio in different languages Disjoint subsets of TV shows assigned to train, development and evaluation Regarding duration: Train dataset: no constraints Development and evaluation datasets: three subsets, containing segments of three nominal durations: 30, 10 and 3 seconds Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work Recording setup Roland Edirol R-09 ultra-light audio recorder Luis J. Rodr´ ıguez-Fuentes et al. KALAKA: A TV Broadcast Speech Database

KALAKA: A TV Broadcast Speech Database for the Evaluation of - PowerPoint PPT Presentation

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems Luis J. Rodr guez-Fuentes, Mikel

Broadcast Algorithms BJRN A. JOHNSSON Overview Best-Effort Broadcast (Regular) Reliable

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Broadcast Encryption and Some Other Primitives Lecture 24 Broadcast Encryption Broadcast

BROADCAST RECEIVER SERVICE Broadcast receiver A broadcast receiver is a dormant component of

BROADCAST RECEIVER SERVICES Broadcast receiver A broadcast receiver is a dormant component of

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Cooperative Broadcast for Cooperative Broadcast for Maximum Network Lifetime Maximum Network

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Tips & Tricks for OMNeT++ Rudolf Hornig OMNeT++ Workshop March 21, 2010 Barcelona, Spain

Cold Atoms from Few-body Physics: A pplica9on of Pionless EFT

Low Temperature Operation Discsussion B. Freemire IIT VCC/HCC Common Interest Meeting July 1,

Thermal Instabilities in Fully and Partially Ionized Prominence Plasmas R. Soler (1), M. Goossens

CERN and the LHC Computing Challenge by Wolf gang von Rden Head, I T Department HP

SR Dr Tim Brookes Institute of Sound Recording Institute of Sound Recording University of

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID What is in the audio

Yasser F. O. Mohammad REMINER 1: Fourier Transform is Additive Scaling of the amplitude in

Sambuz

Useful Links

Newsletter

Mail Us

KALAKA: A TV Broadcast Speech Database for the Evaluation of - PowerPoint PPT Presentation

Introduction Design issues Recording setup Creating the database Using the database Conclusions and future work KALAKA: A TV Broadcast Speech Database for the Evaluation of Language Recognition Systems Luis J. Rodr guez-Fuentes, Mikel

Broadcast Algorithms BJRN A. JOHNSSON Overview Best-Effort Broadcast (Regular) Reliable

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Broadcast Encryption and Some Other Primitives Lecture 24 Broadcast Encryption Broadcast

BROADCAST RECEIVER SERVICE Broadcast receiver A broadcast receiver is a dormant component of

BROADCAST RECEIVER SERVICES Broadcast receiver A broadcast receiver is a dormant component of

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Cooperative Broadcast for Cooperative Broadcast for Maximum Network Lifetime Maximum Network

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

Database Utilities 10/17/2007 DC/Win Database Utilities Opening Database Utilities From File on

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Tips &amp; Tricks for OMNeT++ Rudolf Hornig OMNeT++ Workshop March 21, 2010 Barcelona, Spain

Cold Atoms from Few-body Physics: A pplica9on of Pionless EFT

Low Temperature Operation Discsussion B. Freemire IIT VCC/HCC Common Interest Meeting July 1,

Thermal Instabilities in Fully and Partially Ionized Prominence Plasmas R. Soler (1), M. Goossens

CERN and the LHC Computing Challenge by Wolf gang von Rden Head, I T Department HP

SR Dr Tim Brookes Institute of Sound Recording Institute of Sound Recording University of

Speech Processing 11-492/18-495 Speech Processing 11-492/18-495 Sound ID What is in the audio

Yasser F. O. Mohammad REMINER 1: Fourier Transform is Additive Scaling of the amplitude in

Sambuz

Useful Links

Newsletter

Mail Us

Tips & Tricks for OMNeT++ Rudolf Hornig OMNeT++ Workshop March 21, 2010 Barcelona, Spain