nict use cases and requirements for new models of human
play

NICT Use Cases and Requirements for New Models of Human Language to - PowerPoint PPT Presentation

W3C: Workshop on Conversational Applications, June 2010, NICT Use Cases and Requirements for New Models of Human Language to Support Mobile Conversational Systems Chiori Hori and Teruhisa Misu Spoken Language Communication Group NICT, Japan


  1. W3C: Workshop on Conversational Applications, June 2010, NICT Use Cases and Requirements for New Models of Human Language to Support Mobile Conversational Systems Chiori Hori and Teruhisa Misu Spoken Language Communication Group NICT, Japan

  2. NICT Spoken Language Communication Group 1986 2006 ATR NICT Advanced Telecommunications Research Institute International + Spoken Dialog System Speech-to-Speech Translation

  3. Multi-Party Communication Between Asian Language Speakers

  4. Kyoto Tour Guide System Kyoto Tour Guide System Human-to-Human Dialog Human-to-Human Dialog

  5. Tour Guide Dialog Discourse Make Make Recommend each spot in the list Recommend each spot in the list a recommendation list a recommendation list eps: Trnst(if_forloop_not_end) eps: Set_rcmdlist Stt(exprnc)/ : eps : Check user’s Check user’s Set_imprs(Experienced) Rqst(rcmd) : eps Stt(prcs(rcmd)) preference preference Stt(prf eps : eps : (spot/general)): Accept : eps: eps: eps: eps: eps: eps: eps: Good : eps : eps: eps: eps: eps Mk_ Expln(tgt) OQ(DST) Grt(start) Stt(next_act) Chck_ Extrct_kywd rcmdlist(kwd) Set_tgt Rcmd(tgt) Cnfrm(dcst) Set_imprs Rspns2imprs Set_imprs Aagree Prcs4imprs(tgt) (Decided) forloop(rcmdlist) (Positive) Neutral: Stt(imprs(Next/Bad) : Keep(tgt, rcmdlist) Set_imprs(Neutral/Bad) Decided: Mv(tgt, rcmdlist, dcsnlist) Negative: Remove(tgt, rcmdlist) eps: Stt(ro_requirement) Eexperienced:Remove(tgt, rcmdlist) Set_imprs(Decided) : Grt(end) eps: Stt(prf(tgt)) : Trnst(if_forloop_done) Set A as tgt Set_imprs(Positive) Confirm users’ final decision Confirm users’ final decision Stt eps : eps: (no_prefered_tgt) eps: eps : : Trnst Grt(end) Rqst(dcsn4rcmdlist) Chck_rcmdlist eps (if_data_in_rcmdlist) eps : Trnst(if_nodata_in rcmdlist)

  6. Goal of Spoken Dialog System - Accept users’ spontaneous dialog Accept users’ spontaneous dialog behavior behavior - Mimic guides’ dialog behavior as in the Mimic guides’ dialog behavior as in the data data Issues: 1. Spontaneous speech recognition 2. Robust user concept understanding Corpus-based Corpus-based DM DM 3. Flexible dialog management (DM) 4. Expandable DM platform

  7. Corpus-based Dialog Management • Human-to-Human dialog corpus: Annotation of tags representing “user concept” + “system actions” Statistical models of humans’ dialog behaviors

  8. Advantage of WFST-based DM General Description for Dialog Scenario Different fashions of scenarios: IF ・ THEN rules, Finite State Automaton and Statistical Dialog Management Convert WFST description

  9. Scenario WFST Weighted eighted F Finite inite State tate T Transducer ransducer W Slot-Filling for Origin and Destination 1. State and Arcs User input System response 2. A pair of input input and output output Input Concept Action Response symbols with weights tag Tag 3. Transition is determined by From where? the weights. ε Ask_ORG ε:ε /0 From From_<city> Fill_ORG Osaka. ε : From_<city> : ε ε Ask_ORG/0 Fill_ORG/0 To where? Ask_DST ε ε : To_<city> : To Tokyo To_<city> Fill_DST Fill_DST/0 Ask_DST/1 ε ε ε : exit/2 0 ε exit * Slot handling

  10. Spoken Language Understanding WFST <word-class label="station"> <keyword-class label="time"> <plan repeat="true"> Tokyo six from,(origin) Kyoto seven to,(destination) </word-class> eight </plan> <keyword-class label="origin"> nine <depart> (station) ten at,(time) </keyword-class> eleven </depart twelve </keyword-class> <keyword-class label="destination"> (station) </keyword-class>

  11. Kyoto Tour Guide System using WFST-based Dialog Management

  12. Problems in implementing using SRGS/SISR 1. Context sensitive ASR Statistical Language models for ASR are required to be tuned depending on the current dialogue context determined by previous system prompt, dialogue situations. 2. Separation of ASR and Natural Language Understanding We need to implement speech recognition systems which are more robust to natural language expressions. N-gram language models can be a solution. Consequently, we will need a framework to label semantic annotations on ASR results, afterward. 3. Spoken Language Understanding using WFSTs To realize context sensitive semantic annotation for SLU, we need a description for WFST.

Recommend


More recommend