Audient: Audient: An Acoustic Search Engine An Acoustic Search Engine By Ted Leath Supervisor: Prof. Paul Mc Kevitt Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems School of Computing and Intelligent Systems Faculty of Engineering Faculty of Engineering University of Ulster, Magee University of Ulster, Magee
Food for Thought Food for Thought
Existing SDR Systems Existing SDR Systems • Involve the production of intermediate text Involve the production of intermediate text for the purposes of indexing, searching for the purposes of indexing, searching and retrieval • Require a high level of semantic Require a high level of semantic processing for word recognition processing for word recognition • Have a limited vocabulary Have a limited vocabulary • Have a high word recognition error rate Have a high word recognition error rate
Things can be done differently! Things can be done differently!
Nonword Representations of Speech word Representations of Speech • Could be features of the audio signal Could be features of the audio signal • Could be phonemes
Phonemic and Phonogrammic Streams Phonemic and Phonogrammic Streams Phonogrammic streams are orthographical Phonogrammic streams are orthographical representations of phonemic streams. This representations of phonemic streams. This abstraction is ancient, and partially inherent in abstraction is ancient, and partially inherent in the English alphabet. Egyptian hieroglyphs with semantic and phonetic value. Ref. Egyptian hieroglyphs with semantic and phonetic value. Ref. http://www.omniglot.com/writing/egyptian.htm http://www.omniglot.com/writing/egyptian.htm
Project Goals Project Goals • Create a unique alternative to existing word Create a unique alternative to existing word based LVCSR speech retrieval systems along based LVCSR speech retrieval systems along with potential tools for future cognitive and with potential tools for future cognitive and philosophical investigation philosophical investigation • Develop a speechcentric model which uses centric model which uses standardsbased phonogrammic streams as based phonogrammic streams as primary internal data representation primary internal data representation • Allow both text and nonlexical phonemic audio Allow both text and nonlexical phonemic audio queries of varying length queries of varying length • Test against audio corpora used in the Test against audio corpora used in the evaluation of other Information Retrieval (IR) evaluation of other Information Retrieval (IR) systems
Previous Research/Systems Previous Research/Systems • TREC – The Informedia projects at Carnegie Mellon University The Informedia projects at Carnegie Mellon University – The Video Mail Retrieval and Multimedia Document The Video Mail Retrieval and Multimedia Document Retrieval projects at Cambridge University Retrieval projects at Cambridge University – The SCAN system at AT&T Research The SCAN system at AT&T Research – The THISL project at Sheffield University The THISL project at Sheffield University • SpeechBot and NPR Online SpeechBot and NPR Online – Public Internet Search Sites • The National Gallery of the Spoken Word The National Gallery of the Spoken Word • BBN Rough ‘n’ Ready • FastTalk
SDR System Comparison Chart SDR System Comparison Chart
Audient System Architecture Audient System Architecture Nonspeech Processing Phonogrammic Phonemic Stream Phonemic Stream and Temporal Abstraction/Construction Abstraction/Construction Information Indexing Nonspeech Phonetic and temporal Phonetic and temporal abstraction abstraction Indexed Data Speech Formatted Query Query construction Query construction Audio Input Speech queries Text Queries Text Queries Database Query response
Core Modules Core Modules Location and Digitised Temporal Digitised Audio Reference Audio Stream Synthetic Stream and Speech and Location Location Audio Stream Replay Phonogrammic Stream Phonogrammic Stream Phonemic Recognition Stream to Speech and Abstraction Phonogrammic Stream Create Translation Location Table Phonogrammic and Stream,Location Temporal Digitised and Temporal Phonogrammic Reference Audio Information Phonogrammic Streams, Phonogrammic Streams, Stream Stream Location, Temporal Location, Temporal Information and Indexing Information and Indexing Phonogrammic Text Match Request Translation Information Phonogrammatic Text Table Table Phonogrammic Component Component Match Answer Queries and Phongrammic Table Input Translation TextTranslation Text for Table Translation Text Query Text Speech Query Text to Stream Converted Phonogrammic Stream Converted Phonogrammic Stream Phonogrammic Query Result
Proposed Tools Proposed Tools • The Hidden Markov Model Toolkit (HTK) The Hidden Markov Model Toolkit (HTK) • Linux and C++ • Festival • VoiceXML and the SGML Family VoiceXML and the SGML Family • The Apache Web Server The Apache Web Server
Project Schedule Project Schedule 2002 2003 2004 2005 2006 2007 ID Task Name Start End Duration Duration Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 1 Literature Survey 01/08/2002 01/08/2003 262d 2 Write up literature review 20/06/2003 19/02/2004 175d Selection, installation and integration of 3 17/06/2003 18/12/2003 133d tools Construct Phonemic Recognition and 4 18/12/2003 18/03/2004 66d Abstraction Module 5 Construct Stream to Speech module 18/03/2004 17/06/2004 66d 6 Test and refine modules 17/06/2004 16/07/2004 22d 7 Construct Text to Stream module 16/07/2004 18/10/2004 67d 8 Test and refine modules 18/10/2004 17/11/2004 23d Construct Queries and Table Input 9 17/11/2004 15/02/2005 65d module Construct Create Translation Table 10 15/02/2005 18/05/2005 67d module 11 Construct Audio Stream Replay module 18/05/2005 18/08/2005 67d 12 Integrate and test core modules 19/07/2004 16/12/2005 370d Test core modules against other IR 13 18/08/2005 17/03/2006 152d systems using corpora and optimise 14 Populate index and demonstrate 17/03/2006 22/06/2006 70d 15 Incorporate search engine elements 22/06/2006 25/10/2006 90d 16 Finish thesis 14/06/2006 29/05/2007 250d
Conclusion Conclusion • Create a unique alternative to existing word Create a unique alternative to existing word based LVCSR speech retrieval systems along based LVCSR speech retrieval systems along with potential tools for future cognitive and with potential tools for future cognitive and philosophical investigation philosophical investigation • Develop a speechcentric model which uses centric model which uses standardsbased phonogrammic streams as based phonogrammic streams as primary internal data representation primary internal data representation • Allow both text and nonlexical phonemic audio Allow both text and nonlexical phonemic audio queries of varying length queries of varying length • Test against audio corpora used in the Test against audio corpora used in the evaluation of other Information Retrieval (IR) evaluation of other Information Retrieval (IR) systems
Applications Applications • Searching, indexing and retrieval of Internet Searching, indexing and retrieval of Internet audio and video files • Searching, indexing and retrieval of broadcast Searching, indexing and retrieval of broadcast media • Services for the blind • Library services • Surveillance and intelligence gathering Surveillance and intelligence gathering • Voice mail • Audio mining • Trend analysis (topic detection and tracking) Trend analysis (topic detection and tracking)
Recommend
More recommend