Discourse markers and other signals: annotation and analysis - PowerPoint PPT Presentation

Discourse markers and other signals: annotation and analysis Ludivine CRIBLE Bucharest, 15-16 Oct 2019

Overview 2 1. Domains and functions : operational definitions 2. EXMARaLDA suite: general functionalities 3. Hands-on demo : creating an annotated TedTalk transcription 4. Extracting and analyzing data 5. The next step : signalling analysis

The taxonomy in practice Definitions 3

Key principles : reminders 4 – Two independent layers of functional information What is the relation/function expressed by the (semantics of the) DM ?  15 – Which type of content/elements/layer does the DM target ?  4 – Each function can combine with each domain (theoretically) – – Only 1 value per level (no double tags) – You can start annotating at any level

Domains 5 Ideational Rhetorical Sequential Interpersonal Objective relations Subjective relations Segments management Addressee management between external facts between thoughts or speech-acts Structuring topics, Phatic function, Low degree of speaker turns, digressions, manifests the involvement Speaker’s attitude, hesitations, stalling relationship with the beliefs, reasoning hearer Incompatible with Make the steps and expressions of opinion Distance from facts (“I flow of speech more Explicit call or answer to think that…”, “I can say explicit the addressee that…”)

Functions I : discourse relations 6 – Addition (ADD) : S2 provides discourse-new information related to S1 Conjunction – Specification (SPE) : S2 elaborates on S1 with more details or an example – Temporal (TMP) : the two segments are chronologically ordered – Cause (CAU) : S2 explains the situation in S1 Contingency – Consequence (CSQ) : S2 is the result of the situation in S1 – Condition (CND) : S2 is the condition for the truth/relevance of S1 – Concession (CCS) : S2 denies expectations related to S1 – Contrast (CTR) : the two segments differ w.r.t a shared property Comparison – Alternative (ALT) : the segments can replace each other

Functions II : speech-specific 7 – Hedging (HDG) : the DM signals some approximation – Monitoring (MNT) : the DM signals the speaker’s intent to control the flow – Agreeing (AGR) : the DM signals agreement – Disagreeing (DIS) : the DM signals disagreement – Topic (TOP) : the DM signals a start, change or return to topic Domain-specific – Quoting (QUO) : the DM introduces (pseudo-)reported speech

Examples 8 IDE RHE SEQ INT Addition le grand frère avait un non je marchais pas ah non Pacs avait fait une intendance <spk1> tu dis euh cheese pour le rôle de papa et en plus non j'ai pas couru (0.180) et aux baladins (0.780) et euh cliché et genre euh un peu pour d’être papa il avait un j'ai fait encore un détour Camille lui dit euh tu se cacher rôle de d’essayer les oublieras pas de payer <spk2> et un peu pour se cacher choses avant nous aussi ouai Alternative on est plusieurs ou tu c’est pas pour ça qu’on fait de euh ben j'ai fait euh deux ans <spk1> j’avais repris euh des me vouvoies ? la musique mais c’est enfin enfin ma première et ma études en gestion des ressources c’est pas pour être reconnu deuxième euh d'institutrice humaines […] dans la rue euh primaire <spk2> directement après? <spk1> ben euh enfin j’ai arrêté euh l’année passée euh avril et euh […] l’année scolaire suivante Concession elle devait partir le si la démocratie est un mot c’était assez comique de les cet auditeur euh vigilant il va vous lendemain mais elle ancien, ici et maintenant la entendre parler comme ça dire tiens euh encore Jean n’est jamais partie démocratie signifie la euh des filles (0.690) mais d’Ormesson mais on entend Jean prospérité pour tous euh ouais puis après euh d’Ormesson à chaque automne voilà quoi

Tips and notes 9 – Domains form a relative cline, allow for “more” or “less” interpretations – Domains might not mean exactly the same thing for all functions, be flexible – In case of doubt for the function, the bias is the “dictionary” meaning – Test phase and discussion with second annotator necessary – Practice makes perfect 

EXMARaLDA suite General functionalities 10

Generalities 11 – Thomas Schmidt’s team in Hamburg (CLARIN-D) – Open-source annotation software – Designed specifically for spoken text – transcription – text-to-sound alignment – annotation – Download and documentation available at: http://exmaralda.org/en/

EXMARaLDA suite (Schmidt & Wörner 12 2012) – Corpus Manager for corpus metadata – Partitur Editor for transcription and annotation – Exakt for extraction/concordancer 1 2 3

Pros and cons 13 – Open-source – Cannot handle heavy files – All-in-one – Several steps for extraction – User-friendly, intuitive (vs. Praat) – Each annotation tier per speaker – Few constraints (vs. ELAN) – Interoperable format

Input formats 14 – ELAN (.eaf) – Praat (.TextGrid) – Transcriber (.trs) – Folker (.flk) – CHAT (.cha) – Anvil (.anvil) – Annotation Graph file (.xml) same formats available – Plain text (.txt) for export – Treetagger (.txt) – TEI (.xml)

Annotation panel 15 – View > Annotation panel – Open : choose your .xml file in its folder – You can edit the annotation panel with any text editor, e.g. Notepad++ – The file provided follows Crible & Degand (in press) You can change ir or create a new one  cf. EXMARaLDA documentation – – The name of the « category » must be exactly the same as the name of the tier – Automatically displays the list of available values + any description you want – Double-click on the value to add it in the cell (avoids spelling mistakes)

Tips for DM annotation 16 – word-level segmentation – either merge transcription tiers or double annotation tiers – enter list of labels as « Annotation panel » for easy use – prefer chronological order than DM-by-DM to understand the context – don’t do 5 hours in a row – keep calm 

Creating an annotated TedTalk Hands-on demo 17

Exercise 1 18 1. Use transcript provided or download any from https://www.ted.com/ 2. Import it to Partitur Editor 3. Select segmentation rule 4. Create annotation tiers 5. Open annotation panel 6. Identify 5 DRDs and annotate their functions 7. Save as .exb file

Extraction and analysis From EXMARaLDA to Excel 19

CorpusManager file (1) 20 – Group all your annotated files (.exb) in the same folder – Open « CorpusManager » (CoMa) – File > Create corpus from transcriptions – Name the corpus – Click on « Browse » : go into the folder where all the .exb files are stored – DO NOT CLICK ON ONE OF THE FILES – click anywhere else in the folder, otherwise the corpus will erase the .exb file make sure that you can read: File name > *YourCorpusName*.coma – – It will show how many transcription files you have in this folder – « Next »

CorpusManager file (2) 21 – « Select transcriptions » : just click on « Next » – « Segmentation » – Tick the box on « Segment transcriptions » – Select « …use default segmentation », click on « Next » – « Metadata assignment » Click on « Next » – « Speakers » – Click on « Finish »  you created a .coma file

EXAKT 22 – Open EXAKT – File > Open corpus (or shortcut) and find the .coma file you just created – Select « RegEx(A) » – Annotation: Select your tier name, e.g. « DM » – In the « RegEx » box, type your search string, e.g. « well » – Typing the dollar sign $ will give you all the annotations, everything you typed in the « DM » tier – Then click on the binoculars on the right

Visualizing the annotations 23 – You will see a concordancer with all your DMs. – To add the annotations from other tiers: – Columns (top-left corner) > Add annotation – Select the Annotation Category you want (e.g. start with « DM », then « DOMAIN »…) The « Exact » option is fine – « OK » – – To add metadata, such as the name of the transcription file: – Columns > Metadata – Selection « Filename* », click on the « + » sign, then « OK »

Visualizing the annotations 24 The result should look like this:

Exploring the annotations 25 – You can add more characters to the Left and Right context by clicking on the magnifying glass on the right – By doublie-clicking on one item, you can visualize it in the transcription format at the bottom – You can also play it! – You can add a « comment » column if you want, once you revise your annotations – Columns > Add analysis > Analysis name: « Comment » > OK

Extracting the annotations 26 – Click anywhere on the concordancer – Ctrl + A (select everything) – Ctrl + C (copy) – Go on Excel and Ctrl + V (paste on a new Excel sheet)

Working under Excel 27 – You can now filter your data, create pivot tables and graphs, look at frequencies …

Inter- and intra-annotator 28 reliability – To assess the reliability and replicability of your analysis – Intra = repeat your annotations after a while and compare – Inter = compare with another annotator – % measured in Excel : IF((A1=A2);”same”;”diff”) – Kappa scores measured in R or online : https://nlp-ml.io/jg/software/ira/ – Aim for k = 0.7, see Spooren & Degand (2010)

Discourse markers and other signals: annotation and analysis - PowerPoint PPT Presentation

Discourse markers and other signals: annotation and analysis Ludivine CRIBLE Bucharest, 15-16 Oct 2019 Overview 2 1. Domains and functions : operational definitions 2. EXMARaLDA suite: general functionalities 3. Hands-on demo : creating

Computational Models of Discourse Regina Barzilay MIT What is Discourse? What is Discourse?

Computational Discourse 11-711 Algorithms for NLP 15 November 2018 What Is Discourse? Discourse

Computational Discourse 11-711 Algorithms for NLP 31 October 2019 What Is Discourse? Discourse

Discourse Coherence Lecture Plan: Einf uhrung in Pragmatik Discourse cohesion and

Discourse Structure Ling575 Discourse & Dialogue April 13, 2011 Roadmap Project

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Modeling Discourse Cohesion for Discourse Parsing via Memory Network Yanyan Jia, Yuan Ye, Yansong

IMMIGRATION: CHANGING THE PUBLIC DISCOURSE IMMIGRATION: CHANGING THE PUBLIC DISCOURSE

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Memory-Enhanced Models for Discourse Understanding COMP90042 Web Search and Text Analysis Guest

A Systematic Study of Neural Discourse Models for Implicit Discourse Relation Attapol T.

Discourse particles and their connection to sentence types, speech acts, and discourse Eva Csipak

Discourse structure and coherence Christopher Potts CS 244U: Natural language understanding Mar

Computational Models of Discourse: Discourse Parsing Caroline Sporleder Universit at des

Discourse & Dialogue: Introduction Ling 575 A Topics in NLP March 30, 2011 Roadmap

Chapter 16: Discourse Pierre Nugues Lund University Pierre.Nugues@cs.lth.se

Quantum Problems Chris Godsil University of Waterloo Plze, 4 October, 2016 Chris Godsil

Is there an app for that? Eli Edwards, Emerging Technologies Research Librarian Santa Clara

A Formal Classical Proof of Hahn-Banach in Coq Marie Kerjean & Assia Mahboubi Inria Nantes ,

Welfare, Inequality & Poverty # 1 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

INF562 G eom etrie Algorithmique et Applications Algorithmes dapproximation g

SSTIC 2009 Optimiser pour r egner La compilation dans tous ses etats Les limites de la

Does guidance really matter in a place like Luxembourg ? - Jean-Jacques RUPPERT

Leftovers: Leftovers: MPLS, Multicast, MPLS, Multicast, Gateways and Firewalls, Gateways and

Discourse markers and other signals: annotation and analysis - PowerPoint PPT Presentation

Discourse markers and other signals: annotation and analysis Ludivine CRIBLE Bucharest, 15-16 Oct 2019 Overview 2 1. Domains and functions : operational definitions 2. EXMARaLDA suite: general functionalities 3. Hands-on demo : creating

Computational Models of Discourse Regina Barzilay MIT What is Discourse? What is Discourse?

Computational Discourse 11-711 Algorithms for NLP 15 November 2018 What Is Discourse? Discourse

Computational Discourse 11-711 Algorithms for NLP 31 October 2019 What Is Discourse? Discourse

Discourse Coherence Lecture Plan: Einf uhrung in Pragmatik Discourse cohesion and

Discourse Structure Ling575 Discourse &amp; Dialogue April 13, 2011 Roadmap Project

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Modeling Discourse Cohesion for Discourse Parsing via Memory Network Yanyan Jia, Yuan Ye, Yansong

IMMIGRATION: CHANGING THE PUBLIC DISCOURSE IMMIGRATION: CHANGING THE PUBLIC DISCOURSE

Explicit Discourse Connectives Implicit Discourse Relations Bonnie Webber Hannah Rohde

Memory-Enhanced Models for Discourse Understanding COMP90042 Web Search and Text Analysis Guest

A Systematic Study of Neural Discourse Models for Implicit Discourse Relation Attapol T.

Discourse particles and their connection to sentence types, speech acts, and discourse Eva Csipak

Discourse structure and coherence Christopher Potts CS 244U: Natural language understanding Mar

Computational Models of Discourse: Discourse Parsing Caroline Sporleder Universit at des

Discourse &amp; Dialogue: Introduction Ling 575 A Topics in NLP March 30, 2011 Roadmap

Chapter 16: Discourse Pierre Nugues Lund University Pierre.Nugues@cs.lth.se

Quantum Problems Chris Godsil University of Waterloo Plze, 4 October, 2016 Chris Godsil

Is there an app for that? Eli Edwards, Emerging Technologies Research Librarian Santa Clara

A Formal Classical Proof of Hahn-Banach in Coq Marie Kerjean &amp; Assia Mahboubi Inria Nantes ,

Welfare, Inequality &amp; Poverty # 1 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty

INF562 G eom etrie Algorithmique et Applications Algorithmes dapproximation g

SSTIC 2009 Optimiser pour r egner La compilation dans tous ses etats Les limites de la

Does guidance really matter in a place like Luxembourg ? - Jean-Jacques RUPPERT

Leftovers: Leftovers: MPLS, Multicast, MPLS, Multicast, Gateways and Firewalls, Gateways and

Discourse Structure Ling575 Discourse & Dialogue April 13, 2011 Roadmap Project

Discourse & Dialogue: Introduction Ling 575 A Topics in NLP March 30, 2011 Roadmap

A Formal Classical Proof of Hahn-Banach in Coq Marie Kerjean & Assia Mahboubi Inria Nantes ,

Welfare, Inequality & Poverty # 1 1 Arthur CHARPENTIER - Welfare, Inequality and Poverty