From phonetics to speech technology Einar Meister Laboratory of Phonetics and Speech Technology Institute of Cybernetics Tallinn University of Technology
Introduction � Spoken language communication � Progress in ASR � Multi-disciplinary approach � Projects at our lab � Some future plans � Co-operation
Complexity of spoken language processing � Human speech communication is … “the most sophisticated “the most sophisticated behaviour of the most behaviour of the most complex organism in complex organism in the known universe” the known universe” Prof R.Moore R.Moore, University of , University of Prof Sheffield Sheffield
Complexity of spoken language processing � There is huge and diverse literature describing human speech processing behaviour � Many different disciplines are involved � Most knowledge is based on indirect observation � More is known about the peripheral auditory and articulatory systems than the higher level phonetic, linguistic and cognitive processes � Research is fragmented across different levels of human SLP � Models tend to address single aspects of human SLP behaviour � There is little integration between models � Many models are descriptive rather than computational
Progress in ASR Substantial progress has taken � place in the past 20 years Dragon “Naturally Speaking � 10” Large Vocabulary Continuous Speech Up to 99% Accurate and Three Recognition (LVSR) is � available in 11 languages: Times Faster than Typing American English � Australian English � Price: $99-$349 � Southern Asian English � Indian English � UK English � Teen English � Dutch � French � German � Italian � Spanish �
Progress in ASR � MS Windows Vista � http://www.youtube.com/watc offers ASR in 8 h?v=2Y_Jp6PxsSQ (July 29, languages: 2006) � http://www.youtube.com/watc � English (United States) h?v=KyLqUf4cdwc (February � English (United Kingdom) 10, 2007) � German � http://www.microsoft.com/ena � French ble/demos/windowsvista/spee � Spanish chdemo.aspx � Japanese � Traditional Chinese � Simplified Chinese
Progress in ASR � Progress has NOT achieved as a result of deep insights into SLP by humans � Improvements have come from: � extensive use of statistical learning algorithms ( data-driven approach) � availability of a number of large collections of speech and text corpora � increase in computer power
Need for multi-disciplinary approach COGNITIVE INFORMATICS ARTIFICIAL LINGUISTICS Computational INTELLIGENCE Linguistics Natural Cognitive Language Science Proc. SPOKEN Pattern Psycho- LANGUAGE PROCESSING Processing Linguistics Information Dialogue Retrieval Human- ENGINEERING PSYCHOLOGY Computer Interaction
Need for multi-disciplinary approach Chin-Hui Lee (Georgia Institute of Technology, Atlanta, USA): From Knowledge-Ignorant to Knowledge-Rich Modelling: A New Speech Research Paradigm for Next Generation Automatic Speech Recognition (ICSLP 2004) Knowledge-Ignorant Modelling – there’s no data like more data Knowledge-Rich Modelling: Sound-specific features – in addition to spectral (cepstral) � features different other acoustic-phonetic features should be used: duration, loudness, F0, etc Keyword recognition and phrase verification � Human-like speech processing models �
Laboratory of Phonetics and Speech Technology � Speech research at IoC since 1960ies, Lab. of Phonetics and Speech Tech since 1990 � Mission: research on Estonian phonetics and speech technology � Partner in: � eVikings 2 project (2002-2005) � NordForsk VISPP-network (2004-2005) � Doctoral School of Linguistics and Language Technology at the University of Tartu (2005-2008) � National Programme for Estonian Language Technology (2006-2010)
Laboratory of Phonetics and Speech Technology Staff: � Einar Meister: � head of the laboratory, senior researcher � MSc (1998) in system engineering, PhD (2003) in general linguistics � experimental phonetics, speech synthesis, speech databases � Tanel Alumäe: � senior researcher, currently post-doc at LIMSI (France) � � PhD (2006) in computer science speech recognition, language modelling, spoken document retrieval, dialogue � systems Toomas Kirt: � researcher � PhD (2007) in computer science � data processing, neural networks, pattern recognition � Lya Meister: � researcher � MA (2004) in linguistics, doctoral student at Tartu University � experimental phonetics, foreign accent, speech corpora � Temporary staff: 2-3 (1 doctoral student) �
Projects funded by the National Programme for Estonian Language Technology 1. Research and development of methods for Estonian speech recognition (T.Alumäe) � Main tasks: � determining optimal basic lexical units for Estonian LVCSR � development of statistical language modeling techniques � applying of acoustic model adaptation techniques � delivering optimal solutions for development of medium-vocabulary speech recognition systems � development of methods and algorithms for large/unlimited vocabulary speech recognition systems � implementation of speech recognition prototype systems � Current results: � software for automatic segmentation of speech signal � prototype for large vocabulary speech recognition
Projects funded by the National Programme for Estonian Language Technology 2. Speech analysis and speech variability modelling (E.Meister) � Main tasks: microprosody – acoustic and perceptual analysis of intrinsic durations � and fundamental frequency macroprosody – temporal organisation and acoustic, lexical and � syntactic features of spontaneous (lecture) speech acoustics and perception of foreign accent in Estonian �
Projects funded by the National Programme for Estonian Language Technology 3. Speech resources and databases (E.Meister) � Main task: � recording, segmentation and labelling of different speech corpora for acoustic studies and speech technology � development of infrastructure for speech data storage, access and management � Under development: � Accent corpus – recordings of Estonian spoken as foreign language � Corpus of lecture speech – recordings of academic lectures, public talks, conference presentations, etc � News corpus – recordings of radio news
Past projects � SpeechDat-like Estonian speech database (2000-2003) � Estonian Text-to-Speech Synthesiser (2000- 2002) in co-operation with: � Institute of the Estonian Language � Filosoft Ltd. � Dialogue interface to a theatre information database (2002-2004) in co-operation with Tartu University
Future plans � Audio-visual speech synthesis � Classification of Estonian visemes (E.Liba at UT) � A model of a talking head (M.Rei at TUT)
Co-operation with industry � Several projects in past: � EMT, ELION, Tele2 – SpeechDat database recordings � Skype – speech quality assessment � Industry is ready to buy (almost) complete solution, not willing to invest into research � Don’t try to sell lab prototypes – there is a huge gap between a prototype and real application � To cover the gap a lot of funding for development phase is necessary!
Thanks!
Recommend
More recommend