linking the tei
play

Linking the TEI Approaches, Limitations, Use Cases Christian - PowerPoint PPT Presentation

DH2019, Utrecht, 2019-07-11 Linking the TEI Approaches, Limitations, Use Cases Christian Chiarcos & Maxim Ionov {chiarcos|ionov}@cs.uni-frankfurt.de Applied Computational Linguistics Goethe Universitt Frankfurt, Germany Linking the TEI


  1. DH2019, Utrecht, 2019-07-11 Linking the TEI Approaches, Limitations, Use Cases Christian Chiarcos & Maxim Ionov {chiarcos|ionov}@cs.uni-frankfurt.de Applied Computational Linguistics Goethe Universität Frankfurt, Germany

  2. Linking the TEI  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed  The „proper“ way • standard-conformant standoff annotations

  3. Linking the TEI  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed  The „proper“ way • standard-conformant standoff annotations TEI/XML + WebAnnotation (JSON-LD) TEI-compliant and LOD-compliant restricted to static TEI documents

  4. Linking the TEI  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  The inline way • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed inline XML solutions ?  The „proper“ way • standard-conformant standoff annotations TEI/XML + WebAnnotation (JSON-LD) breaks if we TEI-compliant and LOD-compliant have dynamic TEI content restricted to static TEI documents

  5. TEI and TEI customizations  very rich vocabulary • TEI P5: 569 elements, 505 attributes  TEI customization with ODD • high-level specification for customizing the TEI  select modules  refine vocabulary elements  generate (textual) documentation  generate actual schemas • any TEI project should start with such a customization

  6. TEI and TEI customizations  TEI occupies a very distinctive position with respect to the idea of standardization: • strictly speaking, it is not a standard, • but is poised between a standard and a consensus, • possessing some characteristics of each, in ways that have very interesting consequences for extension and interchange. (Bauman & Flanders 2004, bullet points by us) TEI compliance does not entail interoperability For problems not documented in the TEI documentation, • Documents following the same customization are different customizations will not be interoperable interoperable. • Beyond that, the TEI provides only an orientation for their E.g., when trying to encode RDF triples in the TEI ;) interpretation

  7. Resource Description Framework  RDF 1.1: general data model for the web of data • W3C recommendation 2014  directed labeled multi-graph • multiple edges between the same nodes  for every edge: • source node („RDF subject“) defined by URI • edge type („RDF property / relation“) defined by URI • target node („RDF object“) defined by URI  alternatively, target can also be an atomic (literal) value ⇒ „triple“ / „RDF statement“

  8. Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> prefix declarations perseus-nt: tlg002.perseus-grc1:9.35 graph (RDF) text perseus-nt:tlg002.perseus-grc1:9.35 (URI)

  9. Triples and graphs: NT Mark 9:35 variants RDFS URL: machine-readable PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> representation of a particular vocabulary PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations perseus-nt: tlg002.perseus-grc1:9.35 rdfs:label graph „... πάντων ἔσχατος ...“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ .

  10. Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations PREFIX saws : <http://purl.org/saws/ontology#> saws:isVariantOf perseus-nt: saws-nt: tlg002.perseus-grc1:9.35 divsection1.o14.a107 rdfs:label graph „... πάντων ἔσχατος ...“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 .

  11. Triples and graphs: NT Mark 9:35 variants PREFIX perseus-nt : <http://www.perseus.tufts.edu/hopper/text?doc=urn:cts:greekLit:tlg0031.> PREFIX saws-nt : <http://www.ancientwisdoms.ac.uk/cts/urn:cts:greekLit:tlg3017.Syno298.sawsGrc01:divedition.> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> prefix declarations PREFIX saws : <http://purl.org/saws/ontology#> saws:isVariantOf perseus-nt: saws-nt: tlg002.perseus-grc1:9.35 divsection1.o14.a107 rdfs:label rdfs:label graph „... πάντων ἔσχατος ...“ „ἔσχατος πάντων“ (RDF) Turtle perseus-nt:tlg002.perseus-grc1:9.35 rdfs:label „... πάντων ἔσχατος ...“ . saws- nt: divsection1.o14.a107 rdfs:label „ἔσχατος πάντων“ . saws-nt: divsection1.o14.a107 saws:isVariantOf perseus-nt:tlg002.perseus-grc1:9.35 .

  12. Why RDF for philological data?  flexible and generic mechanism for creating cross- references  integration of / linking with linked open data (LOD)  facilitate re-usability, sustainability and replicability  build on existing LOD technology  enforce explicit, machine-readable semantics • URIs for data structures (say, rdfs:label) can resolve against a machine-readable, formal definition • facilitates semantic validation instead of syntactic validation  SHACL, OWL2/DL

  13. Linked Open Data (LOD) a best practice for publishing data on the web  Use URIs as names for things  Use HTTP URIs so that people can look up those names.  Provide useful information, using RDF- based standards  Include links to other URIs https://www.w3.org/DesignIssues/LinkedData.html

  14. Linguistic Linked Open Data (LLOD) http://linguistic-lod.org/ (version of July 2017)

  15. Linking the TEI  Both TEI and LOD are important topics in DH • They do not converge ...  It is not possible to integrate RDF triples in a single TEI document in a way that is both TEI- and W3C-compliant • You can always create customized TEI with extensions for your personal approach to encode RDF triples, but there is no recommended way for doing so

  16. Linking the TEI: Options  The (TEI-)native way • URIs, but no properties • User-provided ad hoc converters  Inline XML solutions • TEI-endorsed, but not LOD-compliant • LOD-compliant, but not TEI-endorsed  What to do and how to choose

  17. Linking the TEI The (TEI-)native way

  18. The (TEI-)native way: @ref*  Various elements can take @ref arguments, these refer to URIs • these correspond to RDF targets • no explicit representation of RDF source or RDF predicate  these must be extrapolated from definitions or data snippets * other URI-bearing attributes do exist, too

  19. The (TEI-)native way: @ref Text Database and Dictionary of Classic Mayan (TWKM, University of Bonn, Germany, 2014-2029) subject: @xml:id property: < g> „is instance of glyph type“ target: @ref

  20. The (TEI-)native way: @ref  It‘s easy to build a converter, BUT • if we have different hypotheses regarding the reading of a sign (physical damage or different interpretation), there is no easy way to express  provenance  uncertainty  etc. of multiple alternative readings subject: @xml:id property: < g> „is instance of glyph type“ target: @ref

  21. The (TEI-)native way: @ref  It‘s easy to build a converter, BUT • if we have different hypotheses regarding the reading of a sign (physical damage or different interpretation), there is no easy way to express  provenance The colleagues will certainly invent something, but ...  uncertainty ... this would be a natural application of reification and established RDF  etc. vocabularies ! subject: @xml:id property: < g> „is instance of glyph type“ target: @ref

  22. Inline XML I TEI-endorsed (not LOD-compliant)

  23. Inline XML: TEI-endorsed  several TEI elements are possible representatives for generic RDF triples <graph> , <fs> , <link> , <relation> • all of these already do have a different interpretation  application for RDF triples thus falls under tag abuse • TEI P5 features examples of RDF triples  using <relation>  not mentioned in the definition

  24. Inline XML: <relation> example  problems: • @active and @passive originate from functions in social (!) networks, their application to directed edges (here between geonames!) is inadequate and confusing • @name is a string attribute, the URI reference cannot be resolved without a specialized converter

  25. Inline XML: <relation> example II  problems: • @active and @passive (as before) • <relation> is syntactically constrained to environments which are reserved for named entities (e.g., <namesList> )  Is an arbitrary text passage really a named entity?  In the original proposal (SAWS), <relation> was made child of <seg> and <ab>

  26. Inline XML II W3C-compliant (not TEI-endorsed)

Recommend


More recommend