multiple level models for multi modal interaction
play

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 - PowerPoint PPT Presentation

Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen Cox 3 , Alan Wing 2 1 School of Engineering, University of Birmingham 2 School of Psychology, University of Birmingham 3 School of Computing,


  1. Multiple-Level Models for Multi- Modal Interaction Martin Russell 1 , Antje S. Meyer 2 , Stephen Cox 3 , Alan Wing 2 1 School of Engineering, University of Birmingham 2 School of Psychology, University of Birmingham 3 School of Computing, University of East Anglia Spoken Language and HCI Grand Challenge : slide 1

  2. Outline of talk • Motivation for multi-modal interaction • Multiple-level representations to explain variability • Multiple-level representations to integrate modalities • Issues in combining modalities • Example: speech and gaze • Proposed research • Conclusions Spoken Language and HCI Grand Challenge : slide 2

  3. Motivation • Linguistic utterances rarely unambiguous, but communication succeeds – Shared world knowledge – Common discourse model – Speech augmented with eye-gaze and gesture Spoken Language and HCI Grand Challenge : slide 3

  4. Psycholinguistic perspective • In psycholinguistic theories the processes of retrieving and combining words are far better described than the processes of using world and discourse knowledge, eye gaze or gestures Spoken Language and HCI Grand Challenge : slide 4

  5. Computational perspective • Automatic spoken language processing lacks knowledge and theory to explain ambiguity – Assumes direct relationship between word sequences and acoustic signals – Variability treated as noise • No established framework to accommodate complimentary modalities Spoken Language and HCI Grand Challenge : slide 5

  6. Challenges • Psycholinguistics needs: – Better understanding of how speakers and listeners use eye gaze and gesture to augment the speech signal • Computational spoken language processing needs: – Better treatment of variability in spoken language – Better frameworks for augmenting speech with other modalities • Both need fruitful interaction between psycholinguistics and computational spoken language processing Spoken Language and HCI Grand Challenge : slide 6

  7. Example: acoustic variability • Sources of acoustic variability not naturally characterised in the acoustic domain: – Speech dynamics – Individual speaker differences – Speaking styles – … Spoken Language and HCI Grand Challenge : slide 7

  8. A model of acoustic variability • Introduce intermediate, ‘articulatory’ layer Acoustic • Speech dynamics Synthetic modelled as trajectory in acoustic this layer Articulatory-to- W acoustic mapping • Trajectory mapped into acoustic space ‘Modeled’ articulatory • Probabilities calculated in acoustic space Spoken Language and HCI Grand Challenge : slide 8

  9. Combining modalities • Examples: – Lip-shape correlates with speech at the acoustic level… – … but this is not the case in general – Correlation between speech and eye-movement (when it exists) likely to be at conceptual level Spoken Language and HCI Grand Challenge : slide 9

  10. Multiple-level models • Different levels of representation needed: – To model causes of variability in speech – To capture relationship between speech and other modalities • Candidate formalisms already exist: – Graphical models, – Bayesian networks, – layered HMMs – … Spoken Language and HCI Grand Challenge : slide 10

  11. Example: speech and gaze Spoken Language and HCI Grand Challenge : slide 11

  12. Results from ‘map task’ experiment giver follower Spoken Language and HCI Grand Challenge : slide 12

  13. Results from map task giver follower Spoken Language and HCI Grand Challenge : slide 13

  14. Results from map task giver follower Spoken Language and HCI Grand Challenge : slide 14

  15. Results from map task giver follower Spoken Language and HCI Grand Challenge : slide 15

  16. Object naming 4 time gaze lag speech onset Planning to Phonetic, articulatory planning phonological PLUS advanced planning for level next object From ESRC Meyer, Wheeldon Spoken Language and HCI Grand Challenge : slide 16

  17. Object naming Spoken Language and HCI Grand Challenge : slide 17

  18. Lessons from psychology • Gaze-to-speech lags a) Speech-to-Gaze Lags (ms) 300 monosyllabic 250 disyllabic 200 150 100 50 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Repetition Spoken Language and HCI Grand Challenge : slide 18

  19. More lessons… • Gaze duration d) 600 Viewing Times (ms) 550 500 450 400 350 300 250 200 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Repetition Spoken Language and HCI Grand Challenge : slide 19

  20. Speech and gaze • In general, a speaker who looks at an object might: a) Name the object, b) Say something about the object c) Say something about a different topic altogether d) Say nothing at all • There will be a delay (200-300ms for object naming) between finishing looking at an object and talking about it • The delay will be less if the object was discussed previously Spoken Language and HCI Grand Challenge : slide 20

  21. Speech and gaze (continued) • Alternatively, gaze might provide an important cue for classifying the ‘state’ of a communication (e.g. meeting) – Monologue (all eyes on one subject) – Discussion (eyes move between subjects) Spoken Language and HCI Grand Challenge : slide 21

  22. Proposed research • Goal : Improved understanding of user goals and communication states through integration of speech, gaze and gesture • Integrated, multi-disciplinary project, involving psycholinguistics, speech and language processing, and mathematical modeling Spoken Language and HCI Grand Challenge : slide 22

  23. Proposed research (1) • Experimental study of speech, gaze and gesture in referential communication and matching tasks, to determine: – How speakers’ and listeners’ gaze are coordinated spatially and in time – Functional significance of eye gaze and gesture information (by allowing or preventing mutual eye contact between the interlocutors) – Importance of temporal co-ordination of speaker and listener gaze Spoken Language and HCI Grand Challenge : slide 23

  24. Proposed research (2) • Development of multiple-level computer models for integration of speech, gaze and gesture, for – Improved understanding of user goals – Improved classification of communication states (meeting actions) Spoken Language and HCI Grand Challenge : slide 24

  25. Summary • Speech in multi-modal interfaces • Multiple-level models for: – Characterising variability within a modality – Characterising relationships between modalities • Proposal for collaborative research in psycholinguistics and speech technology Spoken Language and HCI Grand Challenge : slide 25

  26. CETaDL meeting room Spoken Language and HCI Grand Challenge : slide 26

Recommend


More recommend