Free English and Czech telephone speech corpus shared under the - PowerPoint PPT Presentation

Data Acoustic Modelling Scripts Evaluation Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license Matěj Korvas, Ondřej Plátek, Ondřej Dušek, Lukáš Žilka, Filip Jurčíček Institute of Formal and Applied Linguistics Faculty of Mathematics and Physics Charles University in Prague May 30 th , 2014 LREC, Reykjavík, Iceland . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0/ 10 1/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Outline 1. Acquiring the data using crowdsourcing 2. ASR training scripts 3. Evaluation Data Acoustic Modelling Scripts Evaluation Introduction The Vystadial 2013 telephone speech corpus • Two corpora of transcribed telephone speech, English and Czech • Under a free license • Distributed with scripts for ASR training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Data Acoustic Modelling Scripts Evaluation Introduction The Vystadial 2013 telephone speech corpus • Two corpora of transcribed telephone speech, English and Czech • Under a free license • Distributed with scripts for ASR training Outline 1. Acquiring the data using crowdsourcing 2. ASR training scripts 3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

The Vystadial 2013 Speech corpus • English and Czech, telephone speech • CC-BY-SA 3.0 license: for research and commercial use • Training scripts for HTK and Kaldi ASR toolkits Data Acoustic Modelling Scripts Evaluation Motivation ASR for a spoken dialogue system? • Commercial (Nuance & others) – costly, restrictive license • Cloud-based (Google, Nuance) – costly or unclear licensing • Custom ASR model – data needed • Available for English • Restrictive license and/or costly for non-LDC members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Data Acoustic Modelling Scripts Evaluation Motivation ASR for a spoken dialogue system? • Commercial (Nuance & others) – costly, restrictive license • Cloud-based (Google, Nuance) – costly or unclear licensing • Custom ASR model – data needed • Available for English • Restrictive license and/or costly for non-LDC members The Vystadial 2013 Speech corpus • English and Czech, telephone speech • CC-BY-SA 3.0 license: for research and commercial use • Training scripts for HTK and Kaldi ASR toolkits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Transcription • Also using Amazon Mechanical Turk • Quality checks, restricted to experienced workers • Orthographic, with non-speech events • __NOISE__ , __LAUGH__ Data Acoustic Modelling Scripts Evaluation English Data Collection • Using crowdsourcing via Amazon Mechanical Turk • Most speakers: American English • Interaction with a spoken dialogue system – restaurant information domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Data Acoustic Modelling Scripts Evaluation English Data Collection • Using crowdsourcing via Amazon Mechanical Turk • Most speakers: American English • Interaction with a spoken dialogue system – restaurant information domain Transcription • Also using Amazon Mechanical Turk • Quality checks, restricted to experienced workers • Orthographic, with non-speech events • __NOISE__ , __LAUGH__ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Transcription • Similar to English • Hired transcribers • Anonymization (personal information excluded) Data Acoustic Modelling Scripts Evaluation Data Collection – Czech Collection • Using crowdsourcing, free Czech phone numbers (AMT unavailable) • Call-a-friend • Repeat-after-me • Spoken dialogue system – public transport information • License agreement at the beginning of the call . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Data Acoustic Modelling Scripts Evaluation Data Collection – Czech Collection • Using crowdsourcing, free Czech phone numbers (AMT unavailable) • Call-a-friend • Repeat-after-me • Spoken dialogue system – public transport information • License agreement at the beginning of the call Transcription • Similar to English • Hired transcribers • Anonymization (personal information excluded) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Characteristics • Different sources (no problem for a general acoustic model) • English: narrow domain • Czech: general domain (multiple domains) • 16kHz mono WAV files ( X.wav ) + matching plain text files with transcription ( X.wav.trn ) Data Acoustic Modelling Scripts Evaluation Data Size • English: 41 hours, 47k sentences (178k words) • Czech: 15 hours, 22k sentences (126k words) • + 2k sents dev, 2k sents test in both languages (ca. 1.5 hr each) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Data Acoustic Modelling Scripts Evaluation Data Size • English: 41 hours, 47k sentences (178k words) • Czech: 15 hours, 22k sentences (126k words) • + 2k sents dev, 2k sents test in both languages (ca. 1.5 hr each) Characteristics • Different sources (no problem for a general acoustic model) • English: narrow domain • Czech: general domain (multiple domains) • 16kHz mono WAV files ( X.wav ) + matching plain text files with transcription ( X.wav.trn ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

• Easily applicable to other data sets (and other languages): • Just need X.wav + X.wav.trn • Language-specific parts: • List of phones in the language • Orthography-to-phonetics mapping (dictionary and/or rules) • “Phonetic questions” – to group similar triphones (HTK only) Data Acoustic Modelling Scripts Evaluation ASR Acoustic Modelling Scripts • Scripts to create acoustic models for ASR • Coding recordings into MFCCs + ∆ + ∆∆ features • For both languages, for HTK and Kaldi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Data Acoustic Modelling Scripts Evaluation ASR Acoustic Modelling Scripts • Scripts to create acoustic models for ASR • Coding recordings into MFCCs + ∆ + ∆∆ features • For both languages, for HTK and Kaldi • Easily applicable to other data sets (and other languages): • Just need X.wav + X.wav.trn • Language-specific parts: • List of phones in the language • Orthography-to-phonetics mapping (dictionary and/or rules) • “Phonetic questions” – to group similar triphones (HTK only) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Kaldi • Finite state transducers • Generative models parallel to HTK (but Viterbi training) • Discriminative models: • Multiple methods and feature transformations available • Our models: non-speaker-adaptive • BMMI training (with unigram LM), LDA + MLLT transformations Data Acoustic Modelling Scripts Evaluation HTK vs. Kaldi HTK • Hidden Markov models, Gaussian mixtures • EM training: uniform → monophone → triphone model • Triphones clustered using phonetic questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8/ 10 Korvas, Plátek, Dušek, Žilka, Jurčíček Free English and Czech telephone speech corpus

Free English and Czech telephone speech corpus shared under the - PowerPoint PPT Presentation

Data Acoustic Modelling Scripts Evaluation Free English and Czech telephone speech corpus shared under the CC-BY-SA 3.0 license Matj Korvas, Ondej Pltek, Ondej Duek, Luk ilka, Filip Jurek Institute of Formal and Applied

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

Corpus Stylistics: Speech, Writing and Thought Presentation in a Corpus of English Writing

4 English I CP or Honors Credits English II CP or Honors of English III CP or

CORPUS STYLISTICS: SPEECH, WRITING AND THOUGHT PRESENTATION IN A CORPUS OF ENGLISH WRITING

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

5: The Corpus of Old English The Dictionary of Old English Corpus 3060 texts A Poetry

The Corpus of Old English P . S. Langeslag The Dictionary of Old English Corpus 3060 Texts

The need for Corpus Statistics: Corpus analysis and the identification of linguistically relevant

NORDIC chamber of commerce in the czech republic czech economy facts in brief 2015 Czech economy

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Recognition Acoustic

Project Overview Speech Speech Generation Generation Common Semantic Frame Speech Speech

EECS E6870 converting speech to text Speech Recognition automatic speech recognition

Speech Processing 11-492/18-492 Speech Processing 11-492/18-492 Speech Synthesis Evaluation

Speech Processing 15-492/18-492 Speech Synthesis Overview Text processing Speech Synthesis

Speech Processing 15- -492/18 492/18- -492 492 Speech Processing 15 Speech Synthesis Prosody

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 25: Speech

Keith Johnson Linguistics, UC Berkeley Phonology Lab @ berkeley Neuroscience @ ucsf

Heterogeneous Classification System for Underwater Acoustic Recognition F. CHAILLAN 1 , S. MEUNIER

s - i r e t t r a a c l t i u o n c i t r a t r e v o c g Stephen

An explicit algorithm for solving the acoustic tomography problem for a moving fluid Alexey

Acoustic positioning system in ice for the Enceladus Explorer Ruth Hoffmann for the EnEx

Measuring the Cosmic Microwave Background with the South Pole Telescope and Future Instruments

Automatically Identifying Agreement and Disagreement in Speech Rik Koncel-Kedziorski, Andrea

Magneto-acoustic waves in an asymmetric magnetic slab Progress in spatial magneto-seismology

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us