DH2019, Utrecht, 2019-07-11 Linking the TEI Approaches, Limitations, Use Cases Christian Chiarcos & Maxim Ionov {chiarcos|ionov}@cs.uni-frankfurt.de Applied Computational Linguistics Goethe Universität Frankfurt, Germany
Linking the TEI The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed The „proper“ way • standard-conformant standoff annotations
Linking the TEI The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed The „proper“ way • standard-conformant standoff annotations TEI/XML + WebAnnotation (JSON-LD) TEI-compliant and LOD-compliant restricted to static TEI documents
Linking the TEI The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed inline XML solutions ? The „proper“ way • standard-conformant standoff annotations TEI/XML + WebAnnotation (JSON-LD) breaks if we TEI-compliant and LOD-compliant have dynamic TEI content restricted to static TEI documents
TEI and TEI customizations very rich vocabulary • TEI P5: 569 elements, 505 attributes TEI customization with ODD • high-level specification for customizing the TEI select modules refine vocabulary elements generate (textual) documentation generate actual schemas • any TEI project should start with such a customization
TEI and TEI customizations TEI occupies a very distinctive position with respect to the idea of standardization: • strictly speaking, it is not a standard, • but is poised between a standard and a consensus, • possessing some characteristics of each, in ways that have very interesting consequences for extension and interchange. (Bauman & Flanders 2004, bullet points by us) TEI compliance does not entail interoperability For problems not documented in the TEI documentation, • Documents following the same customization are different customizations will not be interoperable interoperable. • Beyond that, the TEI provides only an orientation for their E.g., when trying to encode RDF triples in the TEI ;) interpretation
Resource Description Framework RDF 1.1: general data model for the web of data • W3C recommendation 2014 directed labeled multi-graph • multiple edges between the same nodes for every edge: • source node („RDF subject“) defined by URI • edge type („RDF property / relation“) defined by URI • target node („RDF object“) defined by URI alternatively, target can also be an atomic (literal) value ⇒ „triple“ / „RDF statement“
Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> prefix declarations perseus-nt: tlg002.perseus-grc1:9.35 graph (RDF) text perseus-nt:tlg002.perseus-grc1:9.35 (URI)
Triples and graphs: NT Mark 9:35 variants RDFS URL: machine-readable PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> representation of a particular vocabulary PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations perseus-nt: tlg002.perseus-grc1:9.35 rdfs:label graph „... πάντων ἔσχατος ...“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ .
Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations PREFIX saws : <http://purl.org/saws/ontology#> saws:isVariantOf perseus-nt: saws-nt: tlg002.perseus-grc1:9.35 divsection1.o14.a107 rdfs:label graph „... πάντων ἔσχατος ...“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 .
Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations PREFIX saws : <http://purl.org/saws/ontology#> saws:isVariantOf perseus-nt: saws-nt: tlg002.perseus-grc1:9.35 divsection1.o14.a107 rdfs:label rdfs:label graph „... πάντων ἔσχατος ...“ „ἔσχατος πάντων“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws- nt: divsection1.o14.a107 rdfs:label „ἔσχατος πάντων“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 .
Why RDF for philological data? flexible and generic mechanism for creating cross- references integration of / linking with linked open data (LOD) facilitate re-usability, sustainability and replicability build on existing LOD technology enforce explicit, machine-readable semantics • URIs for data structures (say, rdfs:label) can resolve against a machine-readable, formal definition • facilitates semantic validation instead of syntactic validation SHACL, OWL2/DL
Linked Open Data (LOD) a best practice for publishing data on the web Use URIs as names for things Use HTTP URIs so that people can look up those names. Provide useful information, using RDF- based standards Include links to other URIs https://www.w3.org/DesignIssues/LinkedData.html
Linguistic Linked Open Data (LLOD) http://linguistic-lod.org/ (version of July 2017)
Linking the TEI Both TEI and LOD are important topics in DH • They do not converge ... It is not possible to integrate RDF triples in a single TEI document in a way that is both TEI- and W3C-compliant • You can always create customized TEI with extensions for your personal approach to encode RDF triples, but there is no recommended way for doing so
Linking the TEI: Options The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters Inline XML solutions • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed What to do and how to choose
Linking the TEI The (TEI-)native way
The (TEI-)native way: @ref* Various elements can take @ref arguments, these refer to URIs • these correspond to RDF targets • no explicit representation of RDF source or RDF predicate these must be extrapolated from definitions or data snippets * other URI-bearing attributes do exist, too
The (TEI-)native way: @ref Text Database and Dictionary of Classic Mayan (TWKM, University of Bonn, Germany, 2014-2029) subject: @xml:id property: < g> „is instance of glyph type“ target: @ref
The (TEI-)native way: @ref It‘s easy to build a converter, BUT • if we have different hypotheses regarding the reading of a sign (physical damage or different interpretation), there is no easy way to express provenance uncertainty etc. of multiple alternative readings subject: @xml:id property: < g> „is instance of glyph type“ target: @ref
The (TEI-)native way: @ref It‘s easy to build a converter, BUT • if we have different hypotheses regarding the reading of a sign (physical damage or different interpretation), there is no easy way to express provenance The colleagues will certainly invent something, but ... uncertainty ... this would be a natural application of reification and established RDF etc. vocabularies ! subject: @xml:id property: < g> „is instance of glyph type“ target: @ref
Inline XML I TEI-endorsed (not LOD-compliant)
Inline XML: TEI-endorsed several TEI elements are possible representatives for generic RDF triples <graph> , <fs> , <link> , <relation> • all of these already do have a different interpretation application for RDF triples thus falls under tag abuse • TEI P5 features examples of RDF triples using <relation> not mentioned in the definition
Inline XML: <relation> example problems: • @active and @passive originate from functions in social (!) networks, their application to directed edges (here between geonames!) is inadequate and confusing • @name is a string attribute, the URI reference cannot be resolved without a specialized converter
Inline XML: <relation> example II problems: • @active and @passive (as before) • <relation> is syntactically constrained to environments which are reserved for named entities (e.g., <namesList> ) Is an arbitrary text passage really a named entity? In the original proposal (SAWS), <relation> was made child of <seg> and <ab>
Inline XML II W3C-compliant (not TEI-endorsed)
Recommend
More recommend