building ubiquitous and robust speech and natural
play

Building Ubiquitous and Robust Speech and Natural Language I - PowerPoint PPT Presentation

Building Ubiquitous and Robust Speech and Natural Language I nterfaces I Gary Geunbae Lee, Ph.D., Professor Dept. CSE, POSTECH Contents PART-I: Statistical Speech/Language Processing (60min) Natural Language Processing short intro


  1. Semantic Representation • Semantic frame (frame and slot/value structure) [Gildeaand Jurafsky, 2002] – An intermediate semantic representation to serve as the interface between user and dialog system – Each frame contains several typed components called slots . The type of a slot specifies what kind of fillers it is expecting. “Show me flights from Seattle to Boston” ShowFlight <frame name=‘ShowFlight’ type=‘void’> <slot type=‘Subject’> FLIGHT</slot> <slot type=‘Flight’/> Subject Flight <slot type=‘DCity’>SEA</slot> <slot type=‘ACity’>BOS</slot> </slot> FLIGHT Departure_City Arrival_City </frame> SEA BOS Semantic representation on ATIS task; XML format (left) and hierarchical representation (right) [Wang et al., 2005] 19 IUI 20 0 7 tutoria l

  2. Knowledge-based Systems • Knowledge-based systems: – Developers write a syntactic/semantic grammar – A robust parser analyzes the input text with the grammar – Without a large amount of training data • Previous works – MIT: TINA (natural language understanding) [Seneff, 1992] – CMU: PHEONIX [Pellom et al., 1999] – SRI: GEMINI [Dowding et al., 1993] • Disadvantages 1) Grammar development is an error-prone process 2) It takes multiple rounds to fine-tune a grammar 3) Combined linguistic and engineering expertise is required to construct a grammar with good coverage and optimized performance 4) Such a grammar is difficult and expensive to maintain 20 IUI 20 0 7 tutoria l

  3. Statistical Systems • Statistical SLU approaches: – System can automatically learn from example sentences with their corresponding semantics – The annotation are much easier to create and do not require specialized knowledge • Previous works – Microsoft: HMM/CFG composite model [Wang et al., 2005] – AT&T: CHRONUS (Finite-state transducers) [Levin and Pieraccini, 1995] – Cambridge Univ: Hidden vector state model [He and Young, 2005] – Postech: Semantic frame extraction using statistical classifiers [Eun et al., 2004; Eun et al., 2005; Jeong and Lee, 2006] • Disadvantages 1) Data-sparseness problem; system requires a large amount of corpus 2) Lack of domain knowledge 21 IUI 20 0 7 tutoria l

  4. Reducing the Effort of Human Annotation • Active + Semi-supervised learning for SLU [Tur et al., 2005] – Use raw data, and divide them into two sets S raw = S active + S semi + Labeled samples Small Augmented Labeled data Model data + > threshold no Predict & Active Raw data Estimate Learning Confidence Filter yes < threshold 22 IUI 20 0 7 tutoria l

  5. Semantic Frame Extraction • Semantic Frame Extraction (~ Information Extraction Approach ) 1) Dialog act / Main action Identification ~ Classification 2) Frame-Slot Object Extraction ~ Named Entity Recognition 3) Object-Attribute Attachment ~ Relation Extraction – 1) + 2) + 3) ~ Unification How to get to DisneyWorld? Domain: Navigation Dialog Act: WH-question Main Action: Search Feature Extraction / Selection Feature Extraction / Selection Object.Location.Destination=DisneyWorld + + I like DisneyWorld. Info. Dialog Act Frame-Slot Relation Info. Dialog Act Frame-Slot Relation + + + + Domain: Chat Source Identification Extraction Extraction Source Identification Extraction Extraction Dialog Act: Statement + + Main Action: Like Unification Unification + + Object.Location=DisneyWorld Examples of semantic frame structure Overall architecture for semantic analyzer 23 IUI 20 0 7 tutoria l

  6. Frame-Slot Object Extraction • Frame-Slot Extraction ~ NER = Sequence Labeling Problem • A probabilistic model Sequence Labeling Inference Conditional Random Fields [Lafferty et al. 2001] y t y y t y y y t+1 CRF = Undirected t- -1 1 t t+1 graphical model x t x t x t+1 x x x t- -1 1 t t+1 24 IUI 20 0 7 tutoria l

  7. Long-distance Dependency in NER … … … … fly from denver to chicago on dec. . 10th 1999 fly from denver to chicago on dec 10th 1999 DEPART.MONTH … … … … return from denver to chicago on dec. . 10th 1999 return from denver to chicago on dec 10th 1999 RETURN.MONTH • A Solution: Trigger-Induced CRF [Jeong and Lee, 2006] – Basic idea is to add only bundle of (trigger) features which increase log- likelihood of training data – Measuring gain to evaluate the (trigger) features using Kullback-Leibler divergence Feature Gain 25 IUI 20 0 7 tutoria l

  8. References (1/2) • J. Dowding, J. M. Gawron, D. Appelt, J. Bear, L. Cherny, R. Moore, D. and Moran. 1993. Gemini: A natural language system for spoken language understanding. ACL, 54-61. • J. Eun, C. Lee, and G. G. Lee, 2004. An information extraction approach for spoken language understanding. ICSLP. • J. Eun, M. Jeong, and G. G. Lee, 2005. A Multiple Classifier-based Concept-Spotting Approach for Robust Spoken Language Understanding. Interspeech 2005-Eurospeech. • D. Gildea, and D. Jurafsky. 2002. Automatic labeling of semantic roles. Computational Linguistics, 28(3):245-288. • Y. He, and S. Young. January 2005. Semantic processing using the Hidden Vector State model. Computer Speech and Language, 19(1):85-106. • M. Jeong, and G. G. Lee. 2006. Exploiting non-local features for spoken language understanding. COLING/ACL. • J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. ICML. 26 IUI 20 0 7 tutoria l

  9. References (2/2) • E. Levin, and R. Pieraccini. 1995. CHRONUS, the next generation, In Proceedings of 1995 ARPA Spoken Language Systems Technical Workshop, 269--271, Austin, Texas. • B. Pellom, W. Ward., and S. Pradhan. 2000. The CU Communicator: An Architecture for Dialogue Systems. ICSLP. • R. E. Schapire., M. Rochery, M. Rahim, and N. Gupta. 2002, Incorporating prior knowledge into boosting. ICML. pp538-545. • S. Seneff. 1992. TINA: a natural language system for spoken language applications, Computational Linguistics, 18(1):61--86. • G. Tur, D. Hakkani-Tur, and R. E. Schapire. 2005. Combining active and semi-supervised learning for spoken language understanding. Speech Communication. 45:171-186 • Y. Wang, L. Deng, and A. Acero. September 2005, Spoken Language Understanding: An introduction to the statistical framework. IEEE Signal Processing Magazine, 27(5) 27 IUI 20 0 7 tutoria l

  10. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 28 IUI 20 0 7 tutoria l

  11. Dialog for EPG (POSTECH) Unified Chatting and Goal-oriented Dialog (POSTECH) 29 IUI 20 0 7 tutoria l

  12. Spoken Dialog System User Speech System Speech Which date do you want to fly from Washington to Denver? Automatic Speech ASR ASR Recognition Response RG RG Generation Models, Models, Recognized Sentence Rules Rules “I need a flight from Washington DC to Denver roundtrip” System Action GET DEPARTURE_DATE Spoken Language SLU SLU Understanding DM Dialog DM Management ORIGIN_CITY: WASHINGTON DESTINATION_CITY: DENVER FLIGHT_TYPE: ROUNDTRIP Semantic Meaning 30 IUI 20 0 7 tutoria l

  13. VoiceXML-based System • What is VoiceXML? – The HTML(XML) of the voice web. [W3C, working draft] – The open standard markup language for voice application • Can do – Rapid implementation and management – Integrated with World Wide Web – Mixed-Initiative dialog – Able to input push button on telephone – Simple dialog implementation solution • VoiceXML dialogs are built from – <menu>, <form> (similar to “Slot & Filling” system) • Limiting User’s Response – Verification, and Help for invalid response – Good speech recognition accuracy 31 IUI 20 0 7 tutoria l

  14. Example – <Form> Browser : Please say your complete phone number User : 800-555-1212 Browser : Please say your PIN code User : 1 2 3 4 <vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form id="login"> <field name="phone_number" type="phone"> <prompt> Please say your complete phone number </prompt> </field> <field name="pin_code" type="digits"> <prompt> Please say your PIN code </prompt> </field> <block> <submit next=“http://www.example.com/servlet/login” namelist=phone_number pin_code"/> </block> </form> </vxml> 32 IUI 20 0 7 tutoria l

  15. Frame-based Approach • Frame-based system [McTear, 2004] – Asks the user questions to fill slots in a template in order to perform a task (form-filling task) – Permits the user to respond more flexibly to the system’s prompts (as in Example 2.) – Recognizes the main concepts in the user’s utterance Example 2) Example 1) • System: What is your destination? • System: What is your destination? • User: London. • User: London on Friday around 10 in the morning. • System: What day do you want to • System: I have the following travel? connection … • User: Friday 33 IUI 20 0 7 tutoria l

  16. Agent-Based Approach • Properties [Allen et al., 1996] – Complex communication using unrestricted natural language – Mixed-Initiative – Co-operative problem solving – Theorem proving, planning, distributed architectures – Conversational agents • An example User : I’m looking for a job in the Calais area. Are there any servers? System : No, there aren’t any employment servers for Calais. However, there is an employment server for Pasde-Calais and an employment server for Lille. Are you interested in one of these? • System attempts to provide a more co-operative response that might address the user’s needs. 34 IUI 20 0 7 tutoria l

  17. Galaxy Communicator Framework • The Galaxy Communicator software infrastructure is a distributed , message-based , hub-and-spoke infrastructure optimized for constructing spoken dialog systems. [Bayer et al., 2001] • An open source architecture for constructing dialog systems History: MIT Galaxy system � Developed and maintained by MITRE • Message-passing protocol Hub and Clients architecture 35 IUI 20 0 7 tutoria l

  18. References (1/2) • J. F. Allen, B. Miller, E. Ringger and T. Sikorski. 1996. A Robust System for Natural Spoken Dialogue, ACL. • S. Bayer, C. Doran, and B. George. 2001. Dialogue Interaction with the DARPA Communicator Infrastructure: The Development of Useful Software. HLT Research. • R. Cole, editor., Survey of the state of the art in human language technology, Cambridge University Press, New York, NY, USA, 1997. • G. Ferguson, and J. F. Allen. 1998. TRIPS: An Integrated Intelligent Problem- Solving Assistant, AAAI, pp26-30. • K. Komatani, F. Adachi, S. Ueno, T. Kawahara, and H. Okuno. 2003. Flexible Spoken Dialogue System based on User Models and Dynamic Generation of VoiceXML Scripts. SIGDIAL. • S. Larsson, and D. Traum. 2000. Information state and dialogue management in the TRINDI Dialogue Move Engine Toolkit, Natural Language Engineering, 6(3-4). • S. Lang, M. Kleinehagenbrock, S. Hohenner, J. Fritsch, G. A. Fink, and G. Sagerer. 2003. Providing the basis for human-robotinteraction: A multi-modal attention system for a mobile robot. ICMI. pp. 28–35. 36 IUI 20 0 7 tutoria l

  19. References (2/2) • E. Levin, R. Pieraccini, and W. Eckert. 2000, A stochastic model of human- machine interaction for learning dialog strategies. IEEE Transactions on Speech and Audio Processing. 8(1):11-23 • C. Lee, S. Jung, J. Eun, M. Jeong, and G. G. Lee. 2006. A Situation-based Dialogue Management using Dialogue Examples. ICASSP. • W. Marilyn, H. Lynette, and A. John. 2000. Evaluation for Darpa Communicator Spoken Dialogue Systems. LREC. • M. F. McTear, Spoken Dialogue Technology, Springer, 2004. • I. O’Neil, P. Hanna, X. Liu, D. Greer, and M. McTear. 2005. Implementing advanced spoken dialog management in Java. Speech Communication , 54(1):99- 124. • B. Pellom, W. Ward., and S. Pradhan. 2000. The CU Communicator: An Architecture for Dialogue Systems. ICSLP. • A. Rudnicky, E. Thayer, P. Constantinides, C. Tchou, R. Shern, K. Lenzo, W. Xu, and A. Oh. 1999. Creating natural dialogs in the Carnegie Mellon Communicator system. Eurospeech, 4, pp1531-1534. • W3C, Voice Extensible Markup Language (VoiceXML) Version 2.0 Working Draft, http://www.w3c.org/TR/voicexml20/ 37 IUI 20 0 7 tutoria l

  20. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 38 IUI 20 0 7 tutoria l

  21. The Role of Dialog Management • For example, in the flight reservation system – System : Welcome to the Flight Information Service. Where would you like to travel to? – Caller : I would like to fly to London on Friday arriving around 9 in the morning . – System : There is a flight that departs at 7:45 a.m. and arrives at 8:50 a.m. ?????????? ?????????? � In order to process this utterance, the system has to engage in the following processes: 1) Recognize the words that the caller said. (Speech Recognition) 2) Assign a meaning to these words. (Language Understanding) 3) Determine how the utterance fits into the dialog so far and decide what to do next. (Dialog Management) 39 IUI 20 0 7 tutoria l

  22. Information State Update Approach – Rule-based DM (Larsson and Traum, 2000 ) • A method of specifying a dialogue theory that makes it straightforward to implement • Consisting of following five constituents – Information Components – Including aspects of common context – (e.g., participants, common ground, linguistic and intentional structure, obligations and commitments, beliefs, intentions, user models, etc.) – Formal Representations – How to model the information components – (e.g., as lists, sets, typed feature structures, records, etc.) 40 IUI 20 0 7 tutoria l

  23. Information State Approach – Dialogue Moves – Trigger the update of the information state – Be correlated with externally performed actions – Update Rules – Govern the updating of the information state – Update Strategy – For deciding which rules to apply at a given point from the set of applicable ones 41 IUI 20 0 7 tutoria l

  24. Example Dialogue 42 IUI 20 0 7 tutoria l

  25. Example Dialogue 43 IUI 20 0 7 tutoria l

  26. The Hand-crafted Dialog Model is Not Domain Portable • A Tree Branching for Every Possible Situation – It can become very complex. Start Information + Origin Information + Origin + Dest. Information + Destination Information + Origin + Dest +Date Information + Date Information + Information + Dest + Date Origin + Date Flight # Flight # + Flight # + Flight # + Reservation Date Information 44 IUI 20 0 7 tutoria l

  27. An Optimization Problem • Dialog Management as an Optimization Problem – Optimization Goal – Achieve an application goal to minimize a cost function (=objective function) – In General – To minimize the turn of user-system and the DB access until filling all slots – Simple Example : Month and Day Problem – Designing a dialog system that gets a correct date (month and day) from a user through the shortest possible interaction – Objective Function = ω + ω + ω C *# interactio ns *# Errors *# unfilled slots D i e f • How to Mathematically Formalize? – Markov Decision Process (MDP) 45 IUI 20 0 7 tutoria l

  28. Mathematical Formalization • Markov Decision Process (MDP) (Levin et al 2000) – Problems with cost (or reward) objective function are well modeled as Markov Decision Process . – The specification of a sequential decision problem for a fully observable environment that satisfies the Markov Assumption and yields additive rewards. Dialog Manager Dialog State Dialog Action Cost (Prompts, Queries, etc.) (Turn, Error, DB Access, etc.) Environment (User, External DB or other Servers) 46 IUI 20 0 7 tutoria l

  29. Month and Day Example Strategy 1. Good Bye. - = ω + ω - C * 1 * 2 1 i f - - = ω + ω + ω Strategy2. C * 2 * 2 * P * 0 Which date ? Good Bye. 2 i e 1 f - Day - - Month - Strategy 3. Which day ? Which month? Good Bye. - Day Day - - - Month - = ω + ω + ω C * 3 * 2 * P * 0 3 i e 2 f Optimal strategy is the one that minimizes the cost. Strategy 1 is optimal if w i + P 2 * w e - w f > 0 � Recognition error rate is too high Strategy 3 is optimal if 2*(P 1 -P 2 )* w e - w i > 0 � P 1 is much more high than P 2 against a cost of longer interaction 47 IUI 20 0 7 tutoria l

  30. POMDP (Young 2002) • Partially Observable Markov Decision Process (POMDP) – POMDP extends Markov Decision Process by removing the requirement that the system knows its current state precisely. – Instead, the system makes observations about the outside world which give incomplete information about the true current state. – Belief State : A distribution over MDP states in the absence of knowing its state exactly . POMDP MDP s b(s) Current State ∑ ρ = ( b , a ) b ( s ) r ( s , a ) r(s,a) Reward Function ∈ s S ∑ p ( o | s ' , a ) p ( s ' | a , s ) b ( s ) = = ∈ s S b ( s ' ) p ( s ' | o , a , b ) s` Next State p ( o | a , b ) 48 IUI 20 0 7 tutoria l

  31. Example-based Dialog Model Learning (Lee et al 2006) • Example-Based Dialog Modeling – Automatically modeling from dialog corpus – Example-based techniques using dialog example database (DEDB). – This model is simple and domain portable. – DEDB Indexing and Searching – Query key : user intention, semantic frames, discourse history. – Tie-breaking – Utterance similarity Measure – Lexico-Semantic Similarity : Normalized edit distance – Discourse History Similarity : Cosine similarity 49 IUI 20 0 7 tutoria l

  32. Example-based Dialog Modeling • Indexing and Querying – Semantic-based indexing for dialog example database – Lexical-based example database needs much more examples. – The SLU results is the most important index key. – Automatically indexing from dialog corpus. 그럼 SBS 드라마는 언제 하지 ? Utterance Input : User Utterance (when is the SBS drama showing?) Dialog Act Wh-question Main Action Search_start_time Indexing Key Component [channel = SBS, genre =drama] Slots Discourse [1,0,1,0,0,0,0,0,0] History Output : System Concept System Action Inform(date, start_time, program) 50 IUI 20 0 7 tutoria l

  33. Example-based Dialog Modeling • Tie-breaking – Lexico-Semantic Representation 그럼 SBS 드라마는 언제 하지 ? User Utterance (when is the SBS drama showing?) [channel = SBS, genre = 드라마 (dramas)] Component Slots Lexico-Semantic 그럼 [channel] [genre] 는 언제 하 지 Representation – Utterance Similarity Measure Current User Utterance 그럼 [channel] [genre] 는 언제 하 지 Retrieved Examples Slot-Filling Vector : [1,0,1,0,0,0,0,0,0] Lexico-Semantic Similarity [date] [genre] 는 몇 시에 하 니 Discourse History Similarity Slot-Filling Vector : [1,0,0,1,0,0,0,0,0] 51 IUI 20 0 7 tutoria l

  34. Strategy of Example-based Dialog Modeling Dialogue Dialogue Dialogue User’s Utterance User’s Utterance Corpus Corpus Corpus Automatic Automatic Indexing Indexing User User Semantic Semantic Discourse Discourse Intention Intention Frame Frame History History System System Domain Domain Expert Expert Responses Responses Dialogue Dialogue Query Generation Query Generation Example DB Example DB Utterance Similarity Utterance Similarity Utterance Similarity Retrieval Retrieval � Lexico-semantic Similarity � Lexico-semantic Similarity � Lexico-semantic Similarity � Discourse history Similarity � Discourse history Similarity � Discourse history Similarity Dialogue Dialogue Best Dialogue Best Dialogue Examples Examples Example Example Tie-breaking Tie-breaking 52 IUI 20 0 7 tutoria l

  35. Multi-domain/genre Dialog Expert • Dialog Act = Wh-question Dialog Act USER : What is on TV now? USER : What is on TV now? Identification Agent Spotter Domain Spotter EPG • Agent = Task Dialog Corpus • Domain = EPG • Main Action= Search_Program • Start_Time = now Frame-Slot Extraction (EPG) EPG Discourse Database EPG Expert DEDB Inference Manager Discourse History XML Rule Stack Parser Web TV Schedule Contents Database • previous user utterance Retrieved • previous dialog act and Dialog Examples semantic frame EPG • When no example is • previous slot-filling vector Meta-Rule retrieved, meta-rules are used. • Calculate utterance SYSTEM : “XXX” is on SBS, ….. System SYSTEM : “XXX” is on SBS, ….. similarity Response 53 IUI 20 0 7 tutoria l

  36. References • S. Larsson and D. Traum, “ Information state and dialogue management in the TRINDI Dialogue Move Engine Toolkit”, Natural Language Engineering , vol. 6, no. 3-4, pp. 323- 340, 2000 • E. Levin, R. Pieraccini, and W. Eckert, “A stochastic model of human-machine interaction for learning dialog strategies”, IEEE Transactions on Speech and Audio Processing , vol. 8, no. 1, pp. 11-23, 2000. • Steve Young. Talking to Machine (Statistically speaking), ICASSP 2002 , Denver • I. Lane and T. Kawahara. 2006. Verification of speech recognition results incorporating in-domain confidence and discourse coherence measures, IEICE Transactions on Information and Systems, 89(3):931-938. • C. Lee, S. Jung, J. Eun, M. Jeong, and G. G. Lee. 2006. A Situation-based Dialogue Management using Dialogue Examples. ICASSP. • C. Lee, S.Jung, M. Jeong, and G. G. Lee. 2006. Chat and Goal-Oriented Dialog Together: A Unified Example-based Architecture for Multi-Domain Dialog Management, Proceedings of the IEEE/ACL 2006 workshop on spoken language technology (SLT), Aruba. • D. Litman and S. Pan. 1999. Empirically evaluating an adaptable spoken dialogue system. ICUM, pp55-64. • M. F. McTear, Spoken Dialogue Technology, Springer, 2004. • I. O’Neil, P. Hanna, X. Liu, D. Greer, and M. McTear. 2005. Implementing advanced spoken dialog management in Java. Speech Communication , 54(1):99-124. • M. Walker, D. Litman, C. Kamm, and A. Abella. 1997. PARADISE: A general framework for evaluating spoken dialogue agents. ACL/EACL, pp271-280. 54 IUI 20 0 7 tutoria l

  37. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 55 IUI 20 0 7 tutoria l

  38. Dialog Workbench/Studio • Motivation – The biggest problem to use dialog systems in a practical field is “System Maintenance is difficult!” – Practical Dialog Systems need: – Easy and Fast Dialog Modeling to handle new patterns of dialog – Easy to build up new information sources – TV-Guide domain needs new TV-Schedule everyday – Reduce human efforts for maintaining – All dialog components should be synchronized! – Easy to tutor the system – Semi-automatic learning ability is necessary. – Human can’t teach everything. • Previous work – Rapid application development; CSLU Toolkit [CSLU Toolkit] – Scheme design & management; SGStudio [Wang and Acero, 2005] – Help non-experts in developing a user interface; SUEDE [Anoop et al., 2001] 56 IUI 20 0 7 tutoria l

  39. Dialog Workbench • Dialog Studio [Jung et al., 2006] – Dialog workbench System for example-based spoken dialog system – Can do – Tutor the dialog system by adding & editing dialog examples – Synchronize all dialog components – ASR + SLU + DM + Information Accessing – Providing semi-automatic learning ability – Reducing human-efforts for building up or maintaining dialog systems. – Key idea – Generate Possible Dialog Candidates from Corpus – Predicting the possible dialog tagging information using a current model – Human approving or disapproving. 57 IUI 20 0 7 tutoria l

  40. Issue – “Human Efforts Reduction” New dialog example Tagging • – Can be supported by the System using old models. New dialog utterance Display the result. – Old dialog manager tries to handle it Human audit & modify the result – DUP automatically generates the instances. – Administrator can audit DUP and modify the instances. – ASR, SLU models are automatically trained Dialog Example Editing Recommendation Audit & Modify Dialog Utterance Pool Dialog Utterance Pool (Automatically generated example candidates) New Corpus Generation Example-DB Generation Indexing ASR SLU Example-based Model Model DM Model 58 IUI 20 0 7 tutoria l

  41. POSTECH Dialog Studio Demo 59 IUI 20 0 7 tutoria l

  42. References (1/2) • S. J. Cox, and S. Dasmahapatra. 2000. A semantically-based confidence measure for speech recognition. In Proc. of the ICSLP 2000, Beijing. • J. Eun, C. Lee, and G. G. Lee. 2004. An information extraction approach for spoken language understanding. In: Proc. of the ICSLP, Jeju Korea. • T. J. Hazen, J. Polifroni, and S. Seneff. 2002. Recognition confidence scoring and its use in speech language understanding systems. Computer Speech and Language, vol. 16, no. 1, pp. 49–67. • T. J. Hazen, T. Burianek, J. Polifroni, and S. Seneff. 2000. Recognition confidence scoring for use in speech understanding systems. In Proc. of the the ISCA ASR2000 Tutorial and Research Workshop, Paris. • H. Jiang. 2005. Confidence measures for speech recognition. Speech Communication, vol. 45, no. 4, pp. 455–470. • S. Jung, C. Lee, G. G Lee, 2006. Three Phase Verification for Spoken Dialog System. In Proc. IUI. 6 0 IUI 20 0 7 tutoria l

  43. References (2/2) • M. McTear, I. O’Neill, P. Hanna, and X. Liu. 2005. Handling errors and determining confirmation strategies - an object-based approach. Speech Communication, vol. 45, no. 3, pp. 249–269. • I. O’Neill, P. Hanna, X. Liu, D.Greer, and M. McTear. 2005. Implementing advanced spoken dialogue management in Java. Science of Computer Programming, vol. 54, no. 1, pp. 99–124. • T. Paek, and E. Horvitz. 2000. Conversation as action under uncertainty. In Proc. of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 455-464. • Ratnaparkhi, 1998. A Maximum Entropy Models for Natural Language Ambiguity Resolution. Ph.D. Dissertation. University of Pennsylvania. • F. Torres, L.F. Hurtado, F.E Garcia, Sanchis, and E. Segarra. 2005. Error handling in a stochastic dialog system through confidence measures. Speech Communication, vol. 45, no. 3, pp. 211–229. 6 1 IUI 20 0 7 tutoria l

  44. References • K. S. Anoop, R.K. Scott, J. Chen, A. Landay, and C. Chen, 2001. SUEDE: Iterative, Informal Prototyping for Speech Interfaces. Video poster in Extended Abstracts of Human Factors in Computing Systems: CHI, Seattle, WA, pp. 203-204. • S. Jung, C. Lee, G. G. Lee. 2006. Dialog Studio: An Example Based Spoken Dialog System Development Workbench, Dialogs on dialog: Multidisciplinary Evaluation of Advanced Speech-based Interactive Systems, Interspeech2006-ICSLP satellite workshop • Y. Wang, and A. Acero. 2005. SGStudio: Rapid Semantic Grammar Development for Spoken Language Understanding. Proceedings of the Eurospeech Conference. Lisbon, Portugal. • CSLU Toolkit, http://cslu.cse.ogi.edu/toolkit/ 6 2 IUI 20 0 7 tutoria l

  45. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 6 3 IUI 20 0 7 tutoria l

  46. Information Sources Result Query Information Access Dialog Dialog Manager 64 Question Answer IUI 20 0 7 tutoria l

  47. Information Sources Information Access Agent RDB Access Module Question Answering Module Relational Database WEB 6 5 IUI 20 0 7 tutoria l

  48. Building Relational DB from Unstructured Data • A Relational DB Model is Equivalent to an Entity-Relationship Model • We can build an ER Model with the Information Extraction Approach – Named-Entity Recognition (NER) – Relation Extraction Relational Database WEB 66 IUI 20 0 7 tutoria l

  49. Named-Entity Recognition • Named-Entity Recognition (NER) – A task that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, etc. [Chinchor, 1998] Geo-Political Entity Person Hillary Clinton moved to New York last year. Hillary Clinton moved to New York last year. 6 7 IUI 20 0 7 tutoria l

  50. Relation Extraction • Relation Extraction – A task that detects and classification relations between named-entities Geo-Political Entity Person Hillary Clinton moved to New York last year. Hillary Clinton moved to New York last year. AT.Residence 6 8 IUI 20 0 7 tutoria l

  51. Question Answering • Question Answering System for Information Access Dialog System – SiteQ [Lee et al. 2001; Lee and Lee, 2002] – Search answers, not documents Question Document Query Formation Retrieval POS Tagging Dynamic Answer Passage Selection Answer Type Answer Finding Identification Answer Justification Answer Type Answer 69 IUI 20 0 7 tutoria l

  52. References (1/2) • C. Blaschke, L. Hirschman, and A.Yeh. 2004. BioCreative Workshop. • N. Chinchor. 1998. Overview of MUC-7/MET-2, MUC-7. • N. Kambhatla. 2004. Combining lexical, syntactic and semantic features with Maximum Entropy models for extracting relations. ACL. • E. Kim, Y. Song, C. Lee, K. Kim, G. G. Lee, B. Yi, and J. Cha. 2006. Two- phase learning for biological event extraction and verification. ACM TALIP 5(1):61-73 • J. Kim, T. Ohta, Y. Tsuruoka, and Y. Tateisi. 2003. GENIA corpus - a semantically annotated corpus for bio-textmining, Bioinformatics, Vol 19 Suppl.1, pp. 180-182. • J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional random fields: probabilistic models for segmenting and labelling sequence data. ICML. • G. G. Lee, J. Seo, S. Lee, H. Jung, B. H. Cho, C. Lee, B. Kwak, J. Cha, D. Kim, J. An, H. Kim, and K. Kim. 2001. SiteQ: Engineering High Performance QA system Using Lexico-Semantic Pattern Matching and Shallow NLP. TREC-10 . 70 IUI 20 0 7 tutoria l

  53. References (2/2) • S. Lee, and G. G. Lee. 2002. SiteQ/J: A question answering system for Japanese. NTCIR workshop 3 meeting: evaluation of information retrieval, automatic text summarization and question answering, QA tasks. • A. McCallum, and W. Li. 2003. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons, CoNLL. • S. Soderland. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning, 34, 233-72 • Y. Song, E. Kim, G. G. Lee, and B. Yi. 2005. POSBIOTM-NER: a trainable biomedical named-entity recognition system. Bioinformatics, 21 (11): 2794- 2796. • G. Zhou, J. Su, J. Zhang, M. Zhang. 2005. Exploring Various Knowledge in Relation Extraction. ACL. 71 IUI 20 0 7 tutoria l

  54. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 72 IUI 20 0 7 tutoria l

  55. POSTECH Chatbot Demo 73 IUI 20 0 7 tutoria l

  56. Emotion Recognition • Emotion Recognition “I feel blue I feel blue “ today.” today. ” “Do you need a “ Do you need a cheer- -up music? " up music? " cheer “what up? “ what up?” ” • Why is Emotion Recognition important in dialog systems? – Emotion is a part of User Context . – It has been recognized as one of the most significant factor of people to communicate with each other. [T. Polzin, 2000] – Application : Affective HCI (Human-Computer Interface) – Home Networking, Intelligent Robot, ChatBot, … 74 IUI 20 0 7 tutoria l

  57. Traditional Emotion Recognition Facial Expression Speech Text USER : I am very happy. USER : I am very happy. Facial Expression Linguistic Speech Analysis Analysis Analysis Classifier for Final Emotion Decision Emotion Hypothesis 75 IUI 20 0 7 tutoria l

  58. Emotional Categories • Emotional Categories System Categories • Positive : Confident, encouraging, friendly, happy, interested Emotional • Negative : angry, anxious, bored, frustrated, sad, fear Speech DB • Neutral • Ex) EPSaT (Emotional Prosody Speech and Transcription), SiTEC DB • Positive, Non-Positive Call Center • Anger, Fear, Satisfaction, Excuse, Neutral • Ex) HMIHY, Stock Exchange Customer Service Center Tutor • Positive, Negative, Neutral System • Ex) ITSpoke Chat Neutral, Happy, Sad, Surprise, Afraid, Disgusted, Bored, … Messenger 76 IUI 20 0 7 tutoria l

  59. Emotional Features • Speech-to-Emotion – Acoustic correlates related to prosody of speech have been used for recognizing emotions. – Such as pitch, energy, and speech rate of the utterance, Feature-Set Description Acoustic-Prosodic Fundamental Frequency(f0) – max, min, mean, standard deviation Energy – max, min, mean, standard deviation Speaking Rate – voice frame/total frame Pitch Contour ToBI Contour, nuclear pitch accent, phrase+boundary tones Voice Quality Spectral tilt – In general, the features extracted from speech play a significant role in recognizing emotion. 77 IUI 20 0 7 tutoria l

  60. Emotional Features • Text-to-Emotion – Basic Idea – People tend to use specific words to express their emotions in spoken dialogs . – Because they have learned how some words are related to the corresponding emotions. – Psychologists have tried to identify the language of emotions by asking people to list the English words that describe specific emotions. – They identified emotional keyword in spoken language. – It is highly domain dependent. Feature-Set Description N-gram (Unigram, Bigram, Trigram) Non-speech human noise : laughter and sighs Lexical Filled Pause : “Oh” Emotional Keyword Dialog Act, User Identifier Pragmatic Context Past observances in the previous user turns 78 IUI 20 0 7 tutoria l

  61. Classifier • Basic Algorithm within Emotion Recognizer – It is very similar to text categorization and topic detection . – Most of emotion detection system uses the same basic algorithm used in the text categorization or topic detection. • Emotional Keyword Extraction – Emotional Salience k ∑ = = = sal ( w ) I ( E ; W w ) P ( e | w ) i ( w , e ) – Using mutual information n n j n n j = j 1 × tf log( N / n ) = i , j i – TF*IDF (Cosine Normalization) w i , j ∑ = t × 2 [ tf log( N / n )] • Emotion Classifier i , j i i 1 – Feature-Based Classification with combining several features. – Using statistical classification methodologies – Linear discriminant classifier (LDC) – K-nearest neighborhood (k-NN) – Support Vector Machine (SVM) 79 IUI 20 0 7 tutoria l

  62. Advanced ChatBot • Context-Sensitive ChatBot – Most of chatbot can not reflect the discourse context of previous user turns. [ALICE, 2002; B. Shawar, 2002] Context-Free ChatBot USER : Did you watch a Korean team’s game yesterday? SYSTEM : What game?. Obscure answering USER : How come you didn’t know that? SYSTEM : Well, I don’t know. What’s better? Context-Sensitive ChatBot USER : Did you watch a Korean team’s game yesterday? More intelligent answering SYSTEM : What game?. USER : How come you didn’t know that? SYSTEM : I am not interested in sports . 8 0 IUI 20 0 7 tutoria l

  63. Discourse History in ChatBot Context-Free Index Key Context-Sensitive Index Key USER : I had a date last week Previous Semantics = “<s>,<s>” DA = statement-non-opinion Dialog Act = statement-non-opinion Previous Keyword = “ date ” MA = perform Main Goal = perform DA = statement-non-opinion Utterance 1 DOMAIN = friend Agent = chat MA = perform Date = 1 Domain = friend DOMAIN = friend Theme = 1 date = last week Date = 1 event = date Theme = 1 SYSTEM : Did you like her? Discourse Coherence P(DA ,MA |DA ,MA ) t t t- 1 t- 1 = − − − > θ P(statemen t opinion,ju dge|statem ent non opinion,pe rform) USER : Well, I think so, but. Dialog Act = statement-opinion Previous Semantics = “statement-non-opinion,perform” DA = statement-opinion Main Goal = judge Previous Keyword = “date” MA = judge Utterance 2 Agent = chat Scenario Session = “2” DOMAIN = friend Domain = friend DA = statement-opinion SYSTEM : but, what? MA=judge DOMAIN=friend Abstraction of previous user turn USER : I don’t know her real feeling. Dialog Act = statement-opinion Main Goal = know Previous Semantics = “statement-opinion,judge” Agent = chat DA = statement-opinion Previous Keyword = “NULL” Domain = friend MA = know Scenario Session = “2” Utterance 3 DOMAIN = friend theme = real feeling DA = statement-opinion SYSTEM : Why? What happen? Theme = 1 MA=know DOMAIN=friend 8 1 IUI 20 0 7 tutoria l

  64. References • ALICE. 2002. A.L.I.C.E, A.I. Foundation. http://www.alicebot.org/ • L. Holzman and W. Pottenger, 2003. Classification of Emotions in Internet Chat: An Application of Machine Learning Using Speech Phonemes, Technical Report LU-CSE-03-002, Lehigh University. • J. Liscombe, 2006. Detecting and Responding to Emotion inn Speech: Experiments in Three Domains, Ph.D. Thesis Proposal, Columbia University • D. Litman and K. Forbes-Riley, 2005. Recognizing student emotions and attitudes on the basis of utterances in spoken tutoring dialogues with both human and computer tutors, Speech Communication, 48(5):559-590. • C. M. Lee and S. S. Narayanan. 2005. Toward Detecting Emotions in Spoken Dialogos, IEEE Transactions on Speech and Audio Processing, 13(2):293-303. • T. Polzin and A. Waibel. 2000. Emotion-sensitive human-computer interfaces. the ISCA Workshop on Speech and Emotion. • B. Shawar and E. Atwell, 2002. A comparison between Alice and Elizabeth chatbot systems. School of Computing Research Report, University of Leeds • X. Zhe and A. Boucouvalas, 2002. Text-to-Emotion Engine for Real Time Internet Communication, CSNDDSP. 8 2 IUI 20 0 7 tutoria l

  65. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 8 3 IUI 20 0 7 tutoria l

  66. POSTECH multimodal Dialog System Demo 8 4 IUI 20 0 7 tutoria l

  67. Multi-Modal Dialog • Task performance and user preference for multi-modal over speech interfaces [Oviatt et al., 1997] – 10% faster task completion, – 23% fewer words, – 35% fewer task errors, – 35% fewer spoken disfluencies What is a decent Japanese restaurant near here? . Hard to represent using only uni-modal !! 8 5 IUI 20 0 7 tutoria l

  68. Multi-Modal Dialog • Components of multi-modal dialog system [Chai et al., 2002] Multi-modal Uni-modal Discourse Understanding Understanding Understanding & reference analysis Uni-modal interpretation frame Spoken Language Speech Understanding Multi-modal interpretation frame Gesture Multimodal dialog Gesture Understanding Integrator Manager Uni-modal Face Expression interpretation frame 8 6 IUI 20 0 7 tutoria l

  69. References (1/2) • R. A. Bolt, 1980, “Put that there: Voice and gesture at the graphics interface,” Computer Graphics Vol. 14, no. 3, 262-270. • J. Chai, S. Pan, M. Zhou, and K. Houck, 2002, Context-based Multimodal Understanding in Conversational Systems. Proceedings of the Fourth International Conference on Multimodal Interfaces (ICMI). • J. Chai, P. Hong, and M. Zhou, 2004, A Probabilistic Approach to Reference Resolution in Multimodal User Interfaces. Proceedings of 9th International Conference on Intelligent User Interfaces (IUI-04), 70-77. • J. Chai, Z. Prasov, J. Blaim, and R. Jin., 2005, Linguistic Theories in Efficient Multimodal Reference Resolution: an Empirical Investigation. Proceedings of the 10th International Conference on Intelligent User Interfaces (IUI-05), 43-50. • P.R. Cohen, M. Johnston, D.R. McGee, S.L. Oviatt, J.A. Pittman, I. Smith, L. Chen, and J. Clow, 1997, "QuickSet: Multimodal Interaction for Distributed Applications," Intl. Multimedia Conference, 31-40. 8 7 IUI 20 0 7 tutoria l

  70. References (2/2) • H. Holzapfel, K. Nickel, R. Stiefelhagen, 2004, Implementation and Evaluation of a ConstraintBased Multimodal Fusion System for Speech and 3D Pointing Gestures, Proceedings of the International Conference on Multimodal Interfaces, (ICMI), • M. Johnston, 1998. Unification-based multimodal parsing. Proceedings of the International Joint Conference of the Association for Computational Linguistics and the International Committee on Computational Linguistics , 624-630. • M. Johnston, and S. Bangalore. 2000. Finite-state multimodal parsing and understanding. Proceedings of COLING-2000. • M. Johnston, S. Bangalore, G. Vasireddy, A. Stent, P. Ehlen, M. Walker, S. Whittaker, and P. Maloor. 2002. MATCH: An architecture for multimodal dialogue systems. In Proceedings of ACL-2002. • S. L. Oviatt , A. DeAngeli, and K. Kuhn, 1997, Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of Conference on Human Factors in Computing Systems: CHI '97. 8 8 IUI 20 0 7 tutoria l

  71. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 8 9 IUI 20 0 7 tutoria l

  72. POSTECH conversational TTS demo Korean (Dialog) 9 0 IUI 20 0 7 tutoria l

  73. Conversational Text-to-Speech • Text-to-speech system [M. Beutnagel, et al., 1999; J. Schroeter, 2005] – Front end – Text normalization : take raw text and convert things like numbers and abbreviations into their written-out word equivalents. – Linguistic analysis : POS-tagging, grapheme-to-phoneme conversion – Prosody generation : pitch, duration, intensity, pause – Back end – Unit selection : select the most similar units in speech DB to make actual sound output Text Linguistic Prosody Text normalization Analysis Generation (Symbolic linguistic representation) Unit Synthesis Speech Selection Back-end 9 1 IUI 20 0 7 tutoria l

  74. Multilingual Grapheme-to-Phoneme Conversion • Given an alphabet of spelling symbols (graphemes) and an alphabet of phonetic symbols (phonemes), a mapping should be achieved transliterating strings of graphemes into strings of phonemes [W. Daelemans, et al., 1996] <Rule Generation> Alignment Rule extraction Rule pruning Rule association Dictionary <G2P Conversion> Input text Text normalizer Canonical form of graphemes Phonemes • Alignment ㅎ ㅏ ㄱ ㄱ ㅛ ㅇ ㅔ _ _ Graphemes: | | | | | | | | | Phonemes: h a g gg yo _ _ e _ 9 2 IUI 20 0 7 tutoria l

  75. Break Index Prediction • Predicting break index from POS tagged/syntax analyzed sentence • Break index [J. Lee, et al., 2002] – No break : phrase-internal word boundary and a juncture smaller than a word boundary – Minor break : minimal phrasal juncture such as an AP (accentual phrase) boundary – Major break : a strong phrasal juncture such as an IP (intonational phrase) boundary POS tag sequence Trigram (w tag w tag break w tag ) Probabilistic break index prediction Break index tagged POS tag sequence Decision tree for error correction C4.5 Break index tagged POS tag sequence 9 3 IUI 20 0 7 tutoria l

  76. Pitch Prediction using K-ToBI • Using C4.5 (decision tree) • Assume linguistic information and lexical information have influence to tone of syllable • IP tone label prediction [K. E. Dusterhoff, et al., 1999] – Assign one tone among “L%”, “H%”, “LH%”, “HL%”, “LHL%” and “HLH%” tone to the last syllable of IP – Features – POS, punctuation type, the length of phrase, onset, nucleus, coda • AP tone label prediction – Assign one tone among “L” and “H” tone to each syllable of AP – Features – POS, the length of phrase, the location in prosodic phrase 94 IUI 20 0 7 tutoria l

  77. Unit Selection • Index of units: pitch, duration, position in syllable, neighboring phones • Half-diphone synthesis [A. J. Hunt, 1996; A. Conkie, 1999] – The diphone cuts the units at the points of relative stability (the center of a phonetic realization), rather than at the volatile phone-phone transition, where so-called coarticulatory effects appear. 9 5 IUI 20 0 7 tutoria l

  78. References (1/2) • M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal. 1999. The AT&T Next-Gen TTS System. Joint Meeting of ASA, EAA, and DAGA. • A. Conkie. 1999. Robust Unit Selection System for Speech Synthesis. Joint Meeting of ASA, EAA, and DAGA. • W. Daelemans. 1996. Language-Independent Data-Oriented Grapheme- to-Phoneme Conversion. Progress in Speech Synthesis, Springer Verlag, pp77-90. • K. E. Dusterhoff, A. W. Black, and P. Taylor. 1999. Using decision trees within the tilt intonation model to predict f0 contours. Eurospeech- 99. • A. J. Hunt, and A. W. Black. 1996. Unit Selection in a concatenation speech synthesis system using a large speech database. ICASSP-96, vol. 1, pp 373-376. 96 IUI 20 0 7 tutoria l

  79. References (2/2) • S. Kim. 2000. K-ToBI (Korean ToBI) Labelling Conventions. UCLA Working Papers in Phonetics 99. • S. Kim, J. Lee, B. Kim, and G. G. Lee. 2006. Incorporating Second- Order Information Into Two-Step Major Phrase Break Prediction for Korean. ICSLP-06 • J. Lee, B. Kim, and G. G. Lee. 2002. Automatic Corpus-based Tone and Break-Index Prediction using K-ToBI Representation. ACM transactions on Asian language information processing (TALIP), Vol 1, Issue 3, pp207-224. • J. Lee, S. Kim, and G. G. Lee. 2006. Grapheme-to-Phoneme Conversion Using Automatically Extracted Associative Rules for Korean TTS System. ICSLP-06 • J. Schroeter. 2005. Electrical Engineering Handbook, pp16(1)-16(12). 9 7 IUI 20 0 7 tutoria l

  80. Contents • PART-I: Statistical Speech/Language Processing – Natural Language Processing – short intro – Automatic Speech Recognition – (Spoken) Language Understanding • PART-II: Technology of Spoken Dialog Systems – Spoken Dialog Systems – Dialog Management – Dialog Studio – Information Access Dialog – Emotional & Context-sensitive Chatbot – Multi-modal Dialog – Conversational Text-to-Speech • PART-III: Statistical Machine Translation – Statistical Machine Translation – Phrase-based SMT – Speech Translation 9 8 IUI 20 0 7 tutoria l

  81. Statistical Machine Translation POSTECH Statistical MT System Demo Korean-Engish Japanese-Korean Speech to Speech 99 IUI 20 0 7 tutoria l

  82. SMT Task • SMT: Statistical Machine Translation • Task: – Translate a sentence in a language into another language – using statistical features of data. 나는 생각한다 생각한다, , 고로 고로 나는 나는 존재한다 존재한다. . 나는 I think thus I am. I think thus I am. 나는 ) = 0.7 , P( me | 나는 ) = 0.2 , P(I | 나는 ) = 0.7 , P( me | 나는 ) = 0.2 , … … P(I | | 생각하다 생각하다 ) = 0.5, | 생각 생각 ) = 0.4 , P(think| P(think ) = 0.5, P(think P(think| ) = 0.4 ,… … … … 10 0 IUI 20 0 7 tutoria l

Recommend


More recommend