Lecture 2 Annotation tools & Segmentation Summary of Part 1 - PowerPoint PPT Presentation

Center for Reflected Text Analytics Lecture 2 Annotation tools & Segmentation

Summary of Part 1 • Annotation theory • Guidelines • Inter-Annotator agreement • Inter-subjective annotations • Annotation exercise • Discuss disagreements with your neighbor • Improve annotation guidelines University of Stuttgart 2

Annotation Tool Support • Tools can support the annotation process at various stages • Managing multiple annotators • Assign documents to annotate • Supervise their progress • Analyse disagreements • Display disagreements (only) • Calculate quantitative IAA ( κ ) • Create a gold standard • Make decisions on disagreements • Record final decisions • Usable tools: See handout University of Stuttgart 3

Segmentation University of Stuttgart 4

Segmentation Tool Download http://tinyurl.com/cretanetworker = http://www2.ims.uni-stuttgart.de/gcl/reiterns/creta/CRETANetworker.jar University of Stuttgart 5

Segmentation • Abstract definition • No meaning of a segment implied • The task of separating a text into multiple parts (“segments”) • Segmentation according to various criteria based on • Structure (chapters, acts, letters, speeches) • Linguistics (sentences, paragraphs) • Narrative content (scenes, time, place) • Content (topics under discussion) • No generic criterion covering multiple research questions University of Stuttgart 6

Segmentation Viewpoints • Focus on segments • Spans of text • Focus on segment boundaries • Positions in a text • Views are equivalent – we will switch between them when appropriate Segment 1 Segment 2 Segment 4 Segment 3 Segment Segment Segment Boundary Boundary Boundary University of Stuttgart 7

Entities + Segments = Networks Mary Peter Paul Co-Occurrence Network University of Stuttgart 8

Entities + Segments = Networks Slightly more abstract description • Segmented text with the appearing entities ⟨ {A, B}, {A, B, B, B, A}, {A, C} ⟩ A B C A 2 1 • Convert into an (quadratic) adjacency matrix • Diagonal is typically uninteresting B 2 0 • Matrix is symmetric C 1 0 • Create network A • A node is created for each row (or cell) • An edge is created for each cell, B C weighted according to cell value University of Stuttgart 9

Segmentation Annotation • Theoretically • Segments can be annotated just like entity references • Both cover sequences of words • Appropriate annotation guidelines would define when to annotate segments • Practically • Segmentation criterion closely tied to research question • No reasonable generic abstraction layer • That works for multiple research questions and/or text corpora • Single texts only contain a few segments • Much more annotated texts needed for any kind of automatisation University of Stuttgart 10

Segment Annotation Tool • Web-based UI • Beta-Software • Automatic annotation through rules and tools • Entity annotation • Stanford Named Entity Recognizer (Finkel et al., 2005) • Only proper names, no descriptive noun phrases • Rules (regular expressions) – to specify the entity references • Segment annotation • Rules (regular expressions) – to specify the segment boundaries • Unsupervised segmentation algorithm (TextTiling; Hearst, 1994) • Network export → Gephi University of Stuttgart 11

Gephi Network Tool • Free and open source • https://gephi.org • Wide range of metric, filter and layout algorithms • Network editing (e.g., merge nodes) • Plugins • Export into static images University of Stuttgart 12

demo University of Stuttgart 13

Regular Expressions Useful text processing skills 101 • A powerful way to describe sets of character sequences • Many search tools support REs, and all programming languages do • Looks cryptic, but is quite systematic • REs on slides/handout are marked in forward slashes / / for readability • they don’t need to be typed in the tool • Basics • Many regular characters stand for themselves • The RE /a/ finds occurrences of the character “a” • Sequences of characters stand for sequences of themselves • The RE /the/ finds occurrences of the string “the” University of Stuttgart 14

Regular Expressions Basics • Many regular characters stand for themselves • The RE /a/ finds occurrences of the character “a” • Sequences of characters stand for sequences of themselves • The RE /the/ finds occurrences of the string “the” • Meta characters (“quantifiers”) are applied on the previous character • ?: previous character optional (0-1 times) • /them?/ finds both “the” and “them” • +: Previous character one or more times • /ab+/ finds ”ab”, “abb”, ”abbb”, … • The kleene star * finds the previous character zero or more times • /ab*/ finds “a”, “ab”, “abb”, ”abbb”, … University of Stuttgart 15

Regular Expressions Alternations and Character Classes • /(re1|re2)/ finds everything that finds either re1 or re2 • /(good|better|best)/ finds comparative and superlative forms of the adjective “good” • /great(er|est)?/ finds comp. and sup. forms of “great” • The question mark makes the suffixes optional • We can mark alternatives on character level in square brackets: […] • /[Tt]he/ finds upper and lower case forms of “the” • Square brackets support ranges of characters • /[A-Z]/ finds upper-case characters (beware: locale) • /[0-9]/ finds digits University of Stuttgart 16

Regular Expressions Special cases and exceptions • The dot . matches everything • /a.*b/ finds everything that begins with a and ends with b • Escape character: Backslash • In order to find a dot, we need to prevent its special meaning • /.*\.doc/ finds everything that ends on “.doc” (e.g., filenames) University of Stuttgart 17

Regular Expressions Real examples • Chapter 10. • /Chapter [0-9]+\./ • Chapter V. (Roman numbers) • /Chapter [IVXCM]+\./ • Beware: Possible over-matching • Dates: MAY 22., AUGUST 23. • /[A-Z]+ [0-9]+\./ • Beware: Possible over-matching University of Stuttgart 18

TextTiling Hearst (1994) • Unsupervised segmentation algorithm, developed for expository texts • Compares lexicon in a window left and right of a target sentence gap step size = 3 n n+1 sentence boundary window size = 2 2 3 1 0 d n v 2 = dist( v 1 , v 2 ) = v 1 = 7 2 0 9 University of Stuttgart 19

TextTiling Hearst (1994) • Unsupervised segmentation algorithm, developed for expository texts • Compares lexicon in a window left and right of a target sentence gap n n+1 d n d n+3 sentence boundary University of Stuttgart 20

TextTiling Hearst (1994) • More powerful algorithms are available • E.g., topic segmentation • Clear adaptation possibilities • How to create word vectors? • Which words are included (function/content words)? • Which value is represented in the vector (frequency, tf*idf, information, …) • How to calculate similarity/distance? • Cosine, manhattan, … • But: Evaluation is hard • No gold standard available • Different expectations University of Stuttgart 21

Hands-On Session 2 • Go to … • Load a text of your liking (it‘s better if you are familiar with it) • Add entity references by applying the Stanford NER system • Make a brief check, if the important entities are included (“Passepartout”, for instance, is not) • You can add specific names by specifying regular expressions • Add reasonable segment annotations • Export a GEXF file and load it into Gephi • Play with various options and see how the network changes University of Stuttgart 22

Lecture 2 Annotation tools & Segmentation Summary of Part 1 - PowerPoint PPT Presentation

Center for Reflected Text Analytics Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory Guidelines Inter-Annotator agreement Inter-subjective annotations Annotation exercise Discuss

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Annotation Processing in a Kotlin World Zac Sweers @pandanomic Annotation Processing in a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

Lecture 8: Image Segmentation Peng Chao Face++ Researcher pengchao@megvii.com Nov. 2017

Image Segmentation using Seg3D Segmentation From Clinical Scans RA RA LA RV LA LV RV LV

Pixel-Level Im Image Understanding wit ith Semantic Segmentation and Panoptic Segmentation

Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield University of

Systematic Annotation Mark Voorhies 4/5/2012 Mark Voorhies Systematic Annotation Review RTFM

Assessing annotation Assessing annotation consistency in the Gene consistency in the Gene

Introduction Detecting Errors in Effects of Annotation Errors Detecting Errors in Corpus

Web Annotations Building the Experience Annotation An annotation is something added. It is not

Co-Segmentation of 3D Shapes via Subspace Clustering Ruizhen Hu Lubin Fan

Introduction to RFM segmentation Karolis Urbonas Head of Data Science, Amazon DataCamp

SaBRE Load-time selective binary rewriting Paul-Antoine Arras , Anastasios Andronidis, Lus Pina,

through Application Discovery with ExplorViz SSP 18 9th Symposium on Software Performance

Foliations : Whats next after Thurston ? The mathematical legacy of Bill Thurston, tienne

Data Viz April 2, 2020 Data Science CSCI 1951A Brown University Instructor: Ellie Pavlick

Scratch: Making Programming Easy and Fun John Maloney Lifelong Kindergarten Group MIT Media

INTERNET DRAFTS TODAY Basic concepts Markdown Git I-D Template GitHub C-I

Chapter 08 Trophic Dynamics in Evolutionary Context FIGURE 8-1 This simple compartment model

Which should I use for producing print publications? Michael Miller Vice President Antenna