Creating and exploiting multimodal annotated corpora Philippe - PowerPoint PPT Presentation

Creating and exploiting multimodal annotated corpora Philippe Blache, Roxane Bertrand & Ga¨ elle Ferr´ e Laboratoire Parole et Langage CNRS & Universit´ e de Provence LREC 2008 LREC 2008 Multimodal annotated corpora

Introduction Multimodality Information comes from different sources Modalities interaction Each source is partial, incomplete They have to be synchronized Multimodal annotation Goals Usually focus on gesture description Mainly in the perspective of communication Conventions and schemes Tools (Praat, Anvil, Elan, etc.) Our project Linguistic description Study of interaction: annotation of all domains Unrestricted data (natural situations) LREC 2008 Multimodal annotated corpora

Outline The project The CID corpus The annotation process Results Backchannels Reinforcing gestures Perspectives LREC 2008 Multimodal annotated corpora

The corpus Corpus of Interactional Data : 8 dialogs, 1 hour each ([Bertrand & al 07]) Transcribed (orthographic, phonetic) Aligned Annotated Prosody (intonation, units, contours, etc.) Morphosyntax, syntax, Discourse (markers, speech turns, etc.) Gestures LREC 2008 Multimodal annotated corpora

The annotation architecture LREC 2008 Multimodal annotated corpora

Signal segmentation Interpausal units segmentation (IPUs) Syntactic units detection (pattern method) LREC 2008 Multimodal annotated corpora

Transcription Precise transcription convention Transcription by 2 experts Enriched orthographic transcription (EOT), needed for different phenomena annotation and alignment (elisions, schwa, etc.) Generation of 2 transcription versions Orthographic (for the NLP module) Phonetic (for speech analysis) LREC 2008 Multimodal annotated corpora

Alignment LREC 2008 Multimodal annotated corpora

Alignment Identifying the phoneme suite Tokenisation Grapheme-phoneme conversion Alignment tool Input: list of phonemes + audio signal Temporal localization of the phonemes in the signal Manual correction Wrong boundaries Overgeneration (false units) Tokens and phonemes are primary levels, used for anchoring other levels LREC 2008 Multimodal annotated corpora

Intonation: INTSINT LREC 2008 Multimodal annotated corpora

Discourse LREC 2008 Multimodal annotated corpora

Gestures LREC 2008 Multimodal annotated corpora

Summary of the tools Fully automatic IPU segmentation Phoneme alignment Intonation POS tagging Semi-automatic Intonational units Shallow parsing (still needs a segmentation tool) Manual Transcription (we are experimented speech recognition as helping tool) Other annotations Tools and resources available from the CRDO (http://crdo.fr/) LREC 2008 Multimodal annotated corpora

First study: Backchannels Backchannels: minimal signal produced by the hearer. Vocal and gestural BCs (head movements, smiles and laughter, eyebrow movements, etc.), they have different functions Example : Question : Do vocal and gestural BCs behave similarly? In what prosodic and morphological contexts do they appear? LREC 2008 Multimodal annotated corpora

Backchannels Vocal and gestural BCs show similar behavior but gestural BCs appear later than vocal ones Morphological and discursive context After nouns, verbs and adverbs (words with semantic function) Not after connectors (linking words between conversational units) Prosodic context Gestural BCs: after accentual phrases (APs) and intonational phrases (IPs) Vocal BCs: after IPs Encouraged by specific contours (esp. rising), speakers gaze Conclusion : BCs occur at the end of some units, but not with possible turn change. They also play a role in the elaboration of discourse. LREC 2008 Multimodal annotated corpora

Second study: Reinforcing gestures Reinforcing gestures: eyebrow movements, gaze direction, head movements, highlighting discourse elements Example : Questions : What do gestures reinforce? Are they equivalent to known focalization phenomena? LREC 2008 Multimodal annotated corpora

Reinforcing gestures: results No correlation with prosodic focalization, no gesture is associated with specific stress or contour Correlation with adverbs and connectors at the beginning of speech turns Correlation for metaphorics, no correlation for eyebrow movements Conclusion Reinforcing gestures do not serve to express focus Their role is more discursive than expressive LREC 2008 Multimodal annotated corpora

Conclusion CID: large corpus, richly annotated Interest of multimodal annotated corpora Study of natural language, in context Study of interaction Problems Standardisation: coding schemes Synchronization of the different domains (+/- temporal) Interfacing the different tools Perspectives Information structure study Description in terms of constructions (CxG) Multimodal interaction for virtual reality LREC 2008 Multimodal annotated corpora

Creating and exploiting multimodal annotated corpora Philippe - PowerPoint PPT Presentation

Creating and exploiting multimodal annotated corpora Philippe Blache, Roxane Bertrand & Ga elle Ferr e Laboratoire Parole et Langage CNRS & Universit e de Provence LREC 2008 LREC 2008 Multimodal annotated corpora

Artifact 2: Annotated Bibliography, Digital Poster, and Presentation Part 1: Annotated

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /

Paving the Way to a Large-scale Pseudosense-annotated Dataset The problem: Paucity of

Learning the Species of Biomedical Named Entities from Annotated Corpora Xinglong Wang and Claire

Corpus Construction and Annotation Why are annotated corpora important for computational

An Annotated Corpus of Picture Stories Retold by Language Learners Learner Corpora Today Many

Analysing Temporally Annotated Corpora with CAVaT Temporal Annotation What to annotate?

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Exploiting Multimodal Data for Image Understanding Matthieu Guillaumin Supervised by Cordelia

MULTIMODAL OPTIMIZATION MIKE PREUSS. Multimodal Optimization 1 2014-09-14 Mike Preuss

East Slavic parallel corpora: diachronic and diatopic variaton in Belarusian, Ukrainian, and

Towards Continuous Qvality Control for Spoken Language Corpora Anne Ferger and Hanna Hedeland

A solid-state physicist's journey to the centers of planets Sandro Scandolo (ICTP, Trieste,

phase diagram for carbon dioxide phase diagram for carbon dioxide triple point: solid, liquid

Why I prefer thick jails over thin jails Dan Langille EuroBSDCon 2019 Lillehammer

CS101 Lecture 16: Computer Generated Music Aaron Stevens (azs@bu.edu) 1 March 2013 Computer

Synchrotron Mssbauer Spectroscopy (SMS) Wolfgang Sturhahn wolfgang@gps.caltech.edu

Practices and Tools BY M I C H A E L H E RC H E L Version 1.1 Saturday February 11 th , 2012

Presented by Thomas Lorenz 8 th International Workshop on the Physics of Compressible Turbulent

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Sambuz

Useful Links

Newsletter

Mail Us

Creating and exploiting multimodal annotated corpora Philippe - PowerPoint PPT Presentation

Creating and exploiting multimodal annotated corpora Philippe Blache, Roxane Bertrand & Ga elle Ferr e Laboratoire Parole et Langage CNRS & Universit e de Provence LREC 2008 LREC 2008 Multimodal annotated corpora

Artifact 2: Annotated Bibliography, Digital Poster, and Presentation Part 1: Annotated

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

Multimodal Machine Learning Louis-Philippe (LP) Morency CMU Multimodal Communication and Machine

9.4 Local Perception Filters 9.4 Local Perception Filters Exploiting Exploiting Perceptual

Morphology and Corpora: Introduction Marco Baroni University of Bologna Granada Morphology

Dialogue corpora NPFL070 December 11, 2019 (NPFL070) Dialogue corpora December 11, 2019 1 /

Paving the Way to a Large-scale Pseudosense-annotated Dataset The problem: Paucity of

Learning the Species of Biomedical Named Entities from Annotated Corpora Xinglong Wang and Claire

Corpus Construction and Annotation Why are annotated corpora important for computational

An Annotated Corpus of Picture Stories Retold by Language Learners Learner Corpora Today Many

Analysing Temporally Annotated Corpora with CAVaT Temporal Annotation What to annotate?

Multimodal Corridor Planning &amp; Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING

Exploiting Multimodal Data for Image Understanding Matthieu Guillaumin Supervised by Cordelia

MULTIMODAL OPTIMIZATION MIKE PREUSS. Multimodal Optimization 1 2014-09-14 Mike Preuss

East Slavic parallel corpora: diachronic and diatopic variaton in Belarusian, Ukrainian, and

Towards Continuous Qvality Control for Spoken Language Corpora Anne Ferger and Hanna Hedeland

A solid-state physicist's journey to the centers of planets Sandro Scandolo (ICTP, Trieste,

phase diagram for carbon dioxide phase diagram for carbon dioxide triple point: solid, liquid

Why I prefer thick jails over thin jails Dan Langille EuroBSDCon 2019 Lillehammer

CS101 Lecture 16: Computer Generated Music Aaron Stevens (azs@bu.edu) 1 March 2013 Computer

Synchrotron Mssbauer Spectroscopy (SMS) Wolfgang Sturhahn wolfgang@gps.caltech.edu

Practices and Tools BY M I C H A E L H E RC H E L Version 1.1 Saturday February 11 th , 2012

Presented by Thomas Lorenz 8 th International Workshop on the Physics of Compressible Turbulent

STANDARDS IN SPOKEN CORPORA OUTLINE (1) Case study: Spoken

Sambuz

Useful Links

Newsletter

Mail Us

Multimodal Corridor Planning & Engineering Analysis Project A1A MULTIMODAL CORRIDOR PLANNING