Seeing, Hearing, and Touching: Moving to multimodality Putting It All Together Sensory Integration Module Vision • Seeing and Hearing Events Fisher Force & tactile • Touching, Seeing, and Hearing MacLean Virtual feedback interaction model • Integrating Applications: Tight Coupling & Physical Metaphors MacLean • Integrating Applications: Designing for Intimacy Fels Hearing Psychophysics of vision, sound, and touch will change when environment is multimodal Seeing and Hearing Events (Fisher) 2 The morning talks gave a perspective on how vision science can be used to inform the Force display technology works by using mechanical actuators to apply forces to the design of visually complex interfaces such as those used in information visualization user. By simulating the physics of the user’s virtual world, we can compute these systems. The second half of the course looks at intersensory interactions and how they forces in real-time, and then send them to the actuators so that the user feels them/ can inform the move to multimodal environments. These environments combine visual, auditory, and haptic displays with a richer set of inputs from users, including speech, gesture,and Biopotentials. 1 2 Intersensory Interactions Vision systems to multimodality • Intro and metacognitive gap • Ron: Vision systems and subsystems • Integrating Cognitive Science in design – Pre-attentive vision (gist, layout, events) • Cognitive Architecture – Attention (grab ~5 objects for processing) – Modularity and multimodal interaction – Combine for “virtual representation” • Information hiding-- conflict resolution • Extend system concept to modalities • Cognitive impenetrability • Performance differences between modules – Some are similar across modalities • Recalibration – Some are multimodal – Spatial indexes in complex environments – Some are task-dependent • Multimodal cue matching within modules Seeing and Hearing Events (Fisher) 3 Seeing and Hearing Events (Fisher) 4 We begin with a justification for an increased role for theory in the design of these Extending the visual perception studies described by Ron and applied by Tamara to more complex interfaces. I will argue that the combination of a large design space and multimodal interaction is conceptually simple, since vision is composed of separate the structural inability of humans to introspect at the level of sensory and attentional channels that can be thought of as modalities. The move to multimodality is similar to processes makes conventional design techniques inadequate in these situations. This the move to multiple channels, or systems in vision. is followed by a brief discussion of the challenges of taking information from Psychology, Kinesiology and other disciplines that may fall under the broad banner of Cognitive Science into account in designing interactive applications. 3 4
Extending to complex worlds Some systems are multimodal • Lab studies: few events, visual or Example: Cross-modal speech system auditory • Reduces cognitive load • In contrast to multimodal interfaces – Fast, effortless information processing – Virtual worlds • Near-optimal information integration – Augmented reality between cues and sensory modalities – Ubiquitous computing – Fuzzy logic cue integration • How are multiple multimodal events – Bayesian categorization dealt with in the brain? Seeing and Hearing Events (Fisher) 5 Seeing and Hearing Events (Fisher) 6 The previous studies looked a relatively simple environments by our standards (but Modularity of processing has some advantages-- fast, effortless processing of multiple complex from the standpoint of psychophysics!). How can we extend these methods sensory channels. It comes at a cost of lack of cognitive control and lack of access to to more complex environments? early-stage representations. 5 6 Illusory conjunctions occur in artificial multimodal environments Example: Movie theatre • The McGurk effect (face influences sound) – Dubbed movie • The ventriloquist effect (vision captures sound location) – Sound seems to come from actor Rensink Seeing and Hearing Events (Fisher) 7 Seeing and Hearing Events (Fisher) 8 Another area where we have conducted research deals with a basic question in Feedback from higher-level areas allows a small number of proto-objects to be multimodal perception-- how do the different sensory channels decide what stimuli stabilized. get “matched” between vision, hearing, and touch? And, once matched, how are they integrated and resolved into a multimodal or transmodel percept? [Note: These links may be related to (or even the same as?) the FINSTs in Brian If, as we have maintained, much of our perceptual processing takes place in systems, Fisher’s talk] stimulus matching and fusion have to take place in multiple systems in parallel-- so how do they come up with the same answer? 7 8
Our interpretation: 2 systems at work Study: Pointing to sounds • Cognitive location good • Different multimodal systems solve • Pointing shows visual feature assignment problem differently capture “Ba” – Aware of visual and auditory locations, but – Motor system: high visual dominance point to visual – Cognitive system: low visual dominance • No effect of phoneme/viseme fit on – Phoneme/viseme mismatch doesn’t help ventriloquism What was sound? • Slow recalibration of • Vision can recalibrate spatial sound “map” Where was source? auditory space if offset (point or describe) constant Seeing and Hearing Events (Fisher) 9 Seeing and Hearing Events (Fisher) 10 Looking at the impact of this on perception in complex display environments with multiple visual and auditory events that contain errors in location and category fit generates some counter-intuitive findings. Motor performance is typically found to be less sensitive to illusions than cognitive processes, however it seems that that is not the case for auditory localization. It seems that the dorsal system has a greater drive to combine visual and auditory events, leading to a greater tolerance for location errors for reaching than for voice measures. In pointing to visual targets in the presence of a visual distractor, we found few illusory errors when users were not allowed to see their hands (such as when using a head-mounted display), but closed loop pointing had errors similar to those observed with vocal interactions. Giving them a visible cursor on the screen hurt performance, and adding a lag to the response of the cursor actually aided performance. 9 10 Attentional pointers in systems? Action (motor space) Cognitive processing Visual Auditory localization localization Rensink Seeing and Hearing Events (Fisher) 11 Seeing and Hearing Events (Fisher) 12 Feedback from higher-level areas allows a small number of proto-objects to be If we look at the impact of these attentional tokens, or FINSTs in multimodal stabilized. perception in rich sensory environments, we can see two views of how they might work. The naïve view is that the information from two events is “tagged” by a FINST and is reassembled in cognition after the sensory processes have done their work. [Note: These links may be related to (or even the same as?) the FINSTs in Brian Fisher’s talk] 11 12
Recommend
More recommend