the smartkom multimodal corpus data collection and end to
play

The SmartKom Multimodal Corpus Data Collection and EndtoEnd - PowerPoint PPT Presentation

The SmartKom Multimodal Corpus Data Collection and EndtoEnd Evaluation Nicole Beringer Institut fr Phonetik und Sprachliche Kommunikation LMU Mnchen The SmartKom Multimodal Corpus Data Collection and EndtoEnd


  1. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Nicole Beringer Institut für Phonetik und Sprachliche Kommunikation LMU München

  2. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Where can the IPSK (LMU) be found within the project? Modules Modules Implementation of problem solving strategies Feedback about user reactions user behaviour? improved prototype Data Collection, Evaluation, Annotation

  3. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Overview: Responsibilities of the IPSK−group in SmartKom Data Collection � Annotation � � WOZ design � Transliteration of the audio � WOZ experiments data � some useful results � Prosodic Annotation End−to−End−Evaluation � � Annotation of the gestures � Problems with Multimodality � Annotation of facial expression � Evaluation Framework � Annotation of user states

  4. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Responsibility Network Evaluation Data Collection Providing Data User modelling for Recognition MODULES � WOZ System − Studio � Recordings � Annotation of audio, gesture, emotion � Distribution

  5. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Data Collection Creating and publishing of data for � � the training of recognizers (speech, prosodic feature, gesture, facial expression, emotion) � dialogue creation � generation of information (speech) Research � � user modelling � evaluation (usability & technical evaluation) Software �

  6. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Training of recognizers user modelling The BIG Problem: How to persuade users of a nonexisting system just by simulation? Wizard−of−Oz � different users � Instruction − „Market Research“ � 2 recordings (4,5 minutes each) � Recording of audio (different characteristics) � Recording of video (face, profile, display, gestures) � Interview

  7. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation � realistic prototype � created by partners & LMU � influence on development � playback of atmosphere � creation of the studio � Reliability � Quality of speech output � Experiment design � WOZ System with technical defects � Evocation of behaviour (trial and error, gestures, emotion) � Instruction � Provoking of different behaviour (new gestures, anger, new input facilities) � Design of the display � few associations to existing systems � Dialogue with intelligent machine, no ordinary input facilities

  8. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Reliability: the fraud should not be noticed � good preparation � intensive training of the wizards � System makes mistakes Perception of the SmartKom−System „That’s a telephone The system is a machine box, I wouldn’t expect The system is a person The system is to talk to a human. I something in between do not have illusions!“

  9. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Only few associations to existing systems allowed � Simulation of a personal assistant. Polite Users 1 0.9 0.8 subjects used 0.7 polite expressions subjects used 0.6 greetings 0.5 subjects used thanks 0.4 subjects used sorry 0.3 0.2 0.1 0 � existing dialogue partner Percent � Assistant has „personality“ � Assistant leads through the dialogue, has proposals

  10. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation positive aspects 20 17.5 − verbal Interaction ! verbale Interaktion mit Assistent läuft gut 15 einzelne Anwendungen oder − Multimodality is Seiten 12.5 positive Bewertung Persona Schnell 10 only noticed by a few insgesamt eine gute Idee Übersichtlich 7.5 Praktisch users Benutzung macht Spaß 5 Multimodalität Sonstiges 2.5 0 N negative aspects 20 17.5 Kritik an der Sprachausgabe − too slow zu langsam 15 zu geringer Umfang − too few Possibilities 12.5 zu wenig Unterstützung Kritik an der Spracheingabe 10 insgesamt nicht gut − more Help needed Straßenlärm stört 7.5 Kritik an der Persona − Persona not often Gestikeingabe nicht gut 5 Display 2.5 criticized! 0 N

  11. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation What characterizes a comfortable system? Einfache Bedienung Spracherkennung Hardware/Aus− stattung Display−Layout Schnelligkeit Serviceangebot Multimodalität Synthese Sonstiges

  12. SmartKom WOZ−Recordings and Processing of the Data at the LMU WOZ − Recordings Cutting Coordin. DV−Video DV−Video Beamer− SIVIT 11 Audio− of Graph Front Side View Output Stream streams Tablet Holistic Transliteration US−Labeling (TRL) (USH) Preparation of Prosodic US−Labeling Gesture Label− US−Labeling Facial Expr. stream (TRP) (USM) Gesture Labeling (GES) Deliver. Files to Recording DFKI Server of DVD

  13. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Annotation of emotions Subjects during a recording Front view Side view � System is simulated � Subjects are recorded (audio and video) � 4,5Min interaction − e.g. „find a movie for this evening“ � emotions are partly provoked by the wizards

  14. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Orthographical Annotation � Marking of repetitions, hesitations, noise, speech disfluencies etc. � w001_pkw_003_SMA: <Ger"ausch> @1hier @1sehen <:<#> Sie:> <:<#> eine:> "Ubersicht "uber das Programm der ~Heidelberger Kinos . w001_pkd_004_AAA: mhm [PA] [B3 cont] . <Ger"ausch> oh<Z> [B2] , ~F<Z>ight+Club<ROT> <!1 Flight−Club> [NA] [B2] , ~Das+f"unfte+Element<Z><ROT> [NA] [B2] , ~Drum%<ROT> , ~Jakob+der+L"ugner<ROT> [NA] [B3 cont] . <A> ah<OOT> [PA] [B2] , ich w"urde gerne [NA] ~Aimee+_ <"ah> _und+Jaguar [PA] sehen [B3 fall] . <Ger"ausch> wo [PA] wird das gespielt<Z> [NA] [B3 rise] ? <PP>

  15. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation � Annotation of gestures in 3 c ategories : � Interactional gestures: pointing (long & short), free gestures � Supporting gestures: reading, searching, counting � Residual gestures: Emotional gestures, not identifiable gestures R−Emotional (+ cubus) I−Point (short −)

  16. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Annotation of emotions � 3 steps: � Prosodic annotation: audio only, formal labelling system � Holistic labelling: facial expression, audio, context � Holistic labeling includes context information, which is not relevant for the facial expression recognizer . � Therefore we included a „facial expression only“ labeling step (no audio). � For the analysis of the prosody the speech had to be labeled. � The functional approach did not seem to work with speech. � Therefore we adopted a formal coding step that was used in Verbmobil (Fischer, 1999) for the prosody. � The holistic and the formal step for the speech can be combined to get ecological valid data. � facial expression: labelling without audio

  17. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Annotation of emotions � Labeling with some defined Categories for the prosody � subjective categories step: � „anger/irritation" � Pauses between phrases � Pauses between words � „joy/gratification (being successful)“ � Pauses between syllables � „ helplessness “ � Irregular length of syllables � Emphasized words � „ pondering/reflecting “ � Strongly emphasized words � „surprise“ � Clearly articulated words � „neutral“ � Hyperarticulated words � „unidentifiable episode“ � Words overlapped by laughing

  18. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Conclusion (WOZ) � WOZ: realistic data for man−machine interaction � Training of recognizers � Observation of user behaviour � WOZ−technique is time consuming and expensive � BUT: Results out of user observations and questionnaires can early influence the development of the system

  19. The SmartKom Multimodal Corpus − Data Collection and End−to−End Evaluation Website: http://www.smartkom.org/ � http://www.phonetik.uni−muenchen.de/Forschung/Publications/index.html � Corpus Overview: Schiel, F. et al. (2002): Integration of multi−modal data � and annotations into a simple extendable form: the extension of the BAS Partitur Format. LREC Conference Steininger, S. et al. (2002b): User−State Labeling Procedures For The � Multimodal Data Collection Of SmartKom. LREC conference. Beringer N. (2001): Evoking Gestures in SmartKom − Design of the � Graphical User Interface . Gesture Workshop 2001, London, UK. to appear in: Springer "Gesture Workshop 2001, London" Labeling of gestures: Steininger, S. et al. (2001): Labeling of Gestures in � SmartKom − The Coding System. Gesture Workshop 2001, London. Transliteration: Oppermann, D. et al.: Transliterationskonventionen . �

Recommend


More recommend