The NITE XML Toolkit Jonathan Kilgour and Jean Carletta University of Edinburgh Dialogue Interest Group Dec 2009 Kilgour&Carletta NXT
A toy example of linguistic data Kilgour&Carletta NXT
NITE XML Toolkit Open source toolkit for handling annotations with temporal ordering and full structural relations Data storage format designed to support distributed corpus development Libraries for data handling, query, and writing graphical user interfaces Configurable end user browsing and annotation tools for common tasks Command line utilities for analysis, feature extraction Kilgour&Carletta NXT
nt da S statement nt disfluency VP kontrast movement nt backgd VP kontrast reparandum source nt contrast S markable markable nt non-concrete organisation target repair VP old med-gen nt kontrast VP backgd kontrast nt nt nt contrast NP EDITED PP nt nt word NP NP phon phon word does phonword * the * VBZ phonword the DT doesn’t word word word word word word word word word 47.48-47.61 word 47.96-48.18 the sil the government doesn’t have trace to deal with it phon n’t DT DT NN VBZ-RB VB TO VB IN PRP syl RB n * * * * * * * * * * * * syl syl syl syl syl syl syl syl syl syl syl syl n n p n s p n p n p p p ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph dh ah dh ah g ah v er m ih n t d ah z en t hh ae v t ax d iy l w ih dh ih t 47.0 48.0 49.0 t (s) * * phrase phrase phrase * disfl minor major * accent accent accent * * nuclear plain nuclear Kilgour&Carletta NXT
nt da S statement nt disfluency VP kontrast movement nt backgd VP kontrast reparandum source nt contrast S markable markable nt non-concrete organisation target repair VP old med-gen nt kontrast VP backgd kontrast nt nt nt contrast NP EDITED PP nt nt word NP NP phon phon word does phonword * the * VBZ phonword the DT doesn’t word word word word word word word word word 47.48-47.61 word 47.96-48.18 the sil the government doesn’t have trace to deal with it phon n’t DT DT NN VBZ-RB VB TO VB IN PRP syl RB n * * * * * * * * * * * * syl syl syl syl syl syl syl syl syl syl syl syl n n p n s p n p n p p p ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph ph dh ah dh ah g ah v er m ih n t d ah z en t hh ae v t ax d iy l w ih dh ih t 47.0 48.0 49.0 t (s) * * phrase phrase phrase * disfl minor major * accent accent accent * * nuclear plain nuclear Kilgour&Carletta NXT
Community support stand-off annotation using multiple files under version control dependency structure for keeping track of which annotations rely on which versions of which other annotations multiple competing annotations for the same thing (different humans for a reliability assessment, different automatic processes for a competition) logical query language - because this is the only way to analyse this kind of data Kilgour&Carletta NXT
What’s wrong with NXT Flexibility makes it harder to just start using it need to formally describe corpus structure some users struggle with logic no indexing locations within still images or video frames Not enough packaging (connection to automatic tools, authoring corpus structure description) Not ”sold” enough, not known very well in America Kilgour&Carletta NXT
Butterflies: deixis Kilgour&Carletta NXT
Butterflies: Bible studies Kilgour&Carletta NXT
Butterflies: movie review analysis Kilgour&Carletta NXT
Butterflies: dialogue system strategy Kilgour&Carletta NXT
Butterflies: eyetracking Kilgour&Carletta NXT
Butterflies: eyetracking Kilgour&Carletta NXT
Flock of birds Kilgour&Carletta NXT
Google Earth mashup Kilgour&Carletta NXT
Recommend
More recommend