combining modalities in multimodal interfaces
play

Combining Modalities in Multimodal Interfaces Focus on speech and - PowerPoint PPT Presentation

Combining Modalities in Multimodal Interfaces Focus on speech and gestures Focus on speech and gestures Gabriel Skantze gabriel@speech.kth.se Common misconceptions Oviatt Ten myths about multimodal interaction 1. If you build a


  1. Combining Modalities in Multimodal Interfaces Focus on speech and gestures Focus on speech and gestures Gabriel Skantze gabriel@speech.kth.se

  2. Common misconceptions Oviatt “Ten myths about multimodal interaction” 1. If you build a multimodal system, users will interact multimodally. Multimodal interaction 2. Speech and pointing is the dominant multimodal integration pattern. 3. Multimodal input involves simultaneous signals. 4. Speech is the primary input mode in any 4. Speech is the primary input mode in any multimodal system that includes it. 5. Multimodal language does not differ linguistically from unimodal language.

  3. Common misconceptions Oviatt “Ten myths about multimodal interaction” 6. Multimodal integration involves redundancy of content between modes. 7. Individual error-prone recognition technologies Multimodal interaction combine multimodally to produce even greater unreliability. 8. All users’ multimodal commands are integrated in a uniform way. uniform way. 9. Different input modes are capable of transmitting comparable content. 10. Enhanced efficiency is the main advantage of multimodal systems.

  4. Multimodal interface = Multimodal interaction? • Video: BTSLogic provides Directory Assistance and Information Services solutions to telecommunications carriers and operator services Multimodal interaction companies worldwide. Almost all users (95% to 100%) prefer to interact multimodally if they are given the choice. But this does not mean that all interaction is multimodal, rather that the best option is used for every task. About 20% of the interaction has been observed to be multimodal with multimodal interfaces.

  5. Depends on the type of task Multimodal interaction

  6. …and Complexity of task Multimodal interaction

  7. Common misconceptions Oviatt “Ten myths about multimodal interaction” 1. If you build a multimodal system, users will interact multimodally. Multimodal interaction 2. Speech and pointing is the dominant multimodal integration pattern. 3. Multimodal input involves simultaneous signals. 4. Speech is the primary input mode in any 4. Speech is the primary input mode in any multimodal system that includes it. 5. Multimodal language does not differ linguistically from unimodal language.

  8. Put That There [Bolt, 1980]

  9. More than put that there • Combinations of written input, manual gesturing, and facial expressions can generate symbolic information that is more richly expressive than information that is more richly expressive than simple object selection. • Speak-and-point pattern only comprises 14% of Multimodal interaction all spontaneous multimodal utterances. – Pen input is used to create graphics, symbols and signs, gestural marks, digits and lexical content. • In interpersonal multimodal communication, • In interpersonal multimodal communication, pointing gestures account for less than 20% of all gestures. • Conclusion: Multimodal systems should handle other input than speak-and-point.

  10. Common misconceptions Oviatt “Ten myths about multimodal interaction” 1. If you build a multimodal system, users will interact multimodally. Multimodal interaction 2. Speech and pointing is the dominant multimodal integration pattern. 3. Multimodal input involves simultaneous signals. 4. Speech is the primary input mode in any 4. Speech is the primary input mode in any multimodal system that includes it. 5. Multimodal language does not differ linguistically from unimodal language.

  11. Simultaneous or Sequential

  12. Common misconceptions Oviatt “Ten myths about multimodal interaction” 1. If you build a multimodal system, users will interact multimodally. Multimodal interaction 2. Speech and pointing is the dominant multimodal integration pattern. 3. Multimodal input involves simultaneous signals. 4. Speech is the primary input mode in any 4. Speech is the primary input mode in any multimodal system that includes it. 5. Multimodal language does not differ linguistically from unimodal language.

  13. Speech is not everything • Traditionally, speech has been viewed as the primary modality and writing, gestures and haptic as merely supporting modalities. supporting modalities. • However, the other modalities can give information that is not present in the speech signal, e.g., spatial information Multimodal interaction • Pen input precedes speech in 99% of sequentially integrated multimodal commands, and in most simultaneously-integrated ones.

  14. Common misconceptions Oviatt “Ten myths about multimodal interaction” 1. If you build a multimodal system, users will interact multimodally. Multimodal interaction 2. Speech and pointing is the dominant multimodal integration pattern. 3. Multimodal input involves simultaneous signals. 4. Speech is the primary input mode in any 4. Speech is the primary input mode in any multimodal system that includes it. 5. Multimodal language does not differ linguistically from unimodal language.

  15. Speech in multimodality • Briefer, syntactically simpler, and less disfluent than users’ unimodal speech. than users’ unimodal speech. Multimodal interaction “Place a boat dock on the east, no, west end of Reward Lake.” [drawing rectangle] “Add dock.”

  16. Common misconceptions Oviatt “Ten myths about multimodal interaction” 6. Multimodal integration involves redundancy of content between modes. 7. Individual error-prone recognition technologies Multimodal interaction combine multimodally to produce even greater unreliability. 8. All users’ multimodal commands are integrated in a uniform way. uniform way. 9. Different input modes are capable of transmitting comparable content. 10. Enhanced efficiency is the main advantage of multimodal systems.

  17. Complementary, not redundant • Multimodal input is actually mostly complementary, not redundant • Speech and pen give different semantic information: • Speech and pen give different semantic information: – subject, verb, and object spoken, – location with pen. Multimodal interaction • Even during multimodal correction of errors, redundant information is given less than 1% of the time. • During human communication, spontaneous speech • During human communication, spontaneous speech and gesturing do not involve duplicate information. • Designers of multimodal systems therefore should not expect to rely on duplicated information when processing multimodal language.

  18. Common misconceptions Oviatt “Ten myths about multimodal interaction” 6. Multimodal integration involves redundancy of content between modes. 7. Individual error-prone recognition technologies Multimodal interaction combine multimodally to produce even greater unreliability. 8. All users’ multimodal commands are integrated in a uniform way. uniform way. 9. Different input modes are capable of transmitting comparable content. 10. Enhanced efficiency is the main advantage of multimodal systems.

  19. Unimodal errors are corrected 1. User may select least error prone least error prone modality Multimodal interaction 2. User may switch modality 3. Mutual 3. Mutual disambiguation

  20. Common misconceptions Oviatt “Ten myths about multimodal interaction” 6. Multimodal integration involves redundancy of content between modes. 7. Individual error-prone recognition technologies Multimodal interaction combine multimodally to produce even greater unreliability. 8. All users’ multimodal commands are integrated in a uniform way. uniform way. 9. Different input modes are capable of transmitting comparable content. 10. Enhanced efficiency is the main advantage of multimodal systems.

  21. Individual patterns • Large individual differences in interaction patterns. interaction patterns. • User keeps using the same pattern from the Multimodal interaction beginning to the end. • Hence: Multimodal systems that can detect and adapt to a user’s and adapt to a user’s dominant interaction type can considerably improve recognition rates.

  22. Common misconceptions Oviatt “Ten myths about multimodal interaction” 6. Multimodal integration involves redundancy of content between modes. 7. Individual error-prone recognition technologies Multimodal interaction combine multimodally to produce even greater unreliability. 8. All users’ multimodal commands are integrated in a uniform way. uniform way. 9. Different input modes are capable of transmitting comparable content. 10. Enhanced efficiency is the main advantage of multimodal systems.

  23. Strict Multimodality • Strict modality redundancy: redundancy: – All user actions should be possible to express using each modality Multimodal interaction – All system information should be possible to present in each modality • Motivation: • Motivation: – Flexibility, predictability – “Design for all”

  24. Coupling content & modality • All modalities are not equal for all messages. • Speech/writing can convey much information, • Speech/writing can convey much information, but complex spatial shapes, relations among graphic objects, or precise location information is Multimodal interaction difficult… – … but trivial to sketch using a pen. • Speech delivers information directly and • Speech delivers information directly and intentionally, – but gaze reflects the speaker’s focus of interest more passively and unintentionally. • Hence adapt the input modality to the task

Recommend


More recommend