Interlinking source text collections – a Norwegian example Christian-Emil Ore
Charter by king Hákon Hákonsson 1225
Collection 1 – more recent transcripts
Collection 1 – more recent transcripts • Transcripts from manuscripts (facsimiles) • Written in Old Norwegian 1170 – 1405 • Approx 5 000 transcripts • Transcribed on paper 1960 – 1990 • Digitized with simple markup in 1990ies • A part of the Menotec project 2010 – 2012: – Linguistic text corpust, transcript checked – XML encoding on TEI-MENOTA diplomatic level – See www.menota@org
Collection 2 – Diplomatarium Norvegicum Summary Source info Text number Date Place Edited text
Collection 2 – Diplomatarium Norvegicum • 22 volumes, 19 000 entries (approx) • Published 1846 – 1995 • Each entry identified by volume and text number • Digitized 1990ies, home made XML-encoding • Online since 1996 – www.dokpro.uio.no/dipl_norv/diplom_felt.html • TEI P5 encoding 2011
Collection 3 – Regesta Norvegica • 9 volumes, 10 500 entries (approx) • Published 1978 – present • Type setting files and word prosessor files • Home made XML-encoding • Online since 2004 – www.dokpro.uio.no/dipl_norv/regesta_felt.html – Links to Dipl. Norv. on the web. • TEI P5 encoding 2011
Collection 3 – Regesta Norvegica Text witnesses Where it is published, among others Diplomatarium Norvegicum
Linking data – 1997 “Norwegian farm names” 139, Jaaberg . Pron: jåbber. References: - i Jabærghi RB. 31, 56. Jabergh DN II 657, 1471 . Iaberg NRJ. IV 127. Jabere DN III 836, 1539. [...] Diplomatarium Norvegicum Vol II p. 657 No. 882, Date 26 August 1471. Place: [Hyppestad] [...] Jtem swor oc Stenulff Leidulfson sinsz fadhurs ordh at han gek med sin fadhur aff Jabergh som ligger i Sanda Hered deghi effther sancte Johannes dagh [...] 23447. Grave find from Roman iron age from the stone circle at Jåberg (farmnr. 139) Sandar parish,Vestfold county. A) Bronze fibula from older Roman periode of the main type [...] Archaeological acquisition catalogue
Linked Data – TEI XML documents <TEI ...> Part 1, the text <teiHeader> <!--All kind of metadata--> <!-- Persons, places, events, bibl. ref --> <!-- text witnesses etc --> </teiHeader> <text> <! xml encode text goes here -->... </text> ... </TEI> Additional structure with extracted assertions Part 2, data for and metadata from the document expressed Linked Data in RDF-XML compliant with CIDOC-CRM (semantic web) RDF triples Extraction done by XSLT, cf CLAROS project
The CIDOC CRM (ISO-21127) Top-level Classes relevant for Integration E55 Types refer to / refine E39 Actors E28 Conceptual Objects E41 Appellations refer to / identifie (persons, inst.) E18 Physical Things participate in affect or refer to E2 Temporal Entities have location (Events) at w i t h i n E52 Time-Spans E53 Places
The four levels of FRBR ( Functional Requirements for Bibliographic Record) (E28) Work Expression Manifestation Item
Mapping between TEI and CIDOC-CRM Actors, places and events: TEI elements CIDOC-CRM class person | E21 Person org E74 Group place E53 Place event E5 Event Names: TEI elements CIDOC-CRM class name E41 Appellation placeName E82 Place Appellation … … Detailed info at http://www.tei-c.org/SIG/Ontologies/guidelines/guidelinesTeiMappableCrm.xml
Other projects exploring interlinking 1 • CLAROS: Classical Art Research Online Services – Partners in France, Germany, Greece, UK – CLAROS project aims to combine discrete databases of information about the ancient world using an RDF triplestore of assertions using CIDOC CRM. – Currently includes art objects, archaeological sites, antiquarian photographs, and onomastics. Lexicon of Greek Personal Names contributes via representation in TEI XML
Other projects exploring interlinking 2 • WissKI – Wissenschaftliche Kommunikations- Infrastruktur (2009 – 2011) • Common work space, CIDOC-CRM, RDF triplestore • Partner – Gemanische Nationalmuseum, Nürnberg • Art history – Zoologisches Forschungsmuseum Alexander Koenig, Bonn • Biological spesimens, expedition diaries – Friedrich-Alexander-Universität Erlangen-Nürnberg • IT expertise
Information architecture WissKi and CLAROS projects are based on • A common database (RDF triplestore) • Data model based on the CIDOC-CRM and • XML encoded texts compliant with TEI P5 User interface(s) Common RDF triplestore based on CIDOC-CRM - ontology ... ... Partner databases Partner databases
Collection 2 & 3: DN & RN • An entry comprises – A summary: who, what, when, to whom – Information about text witness(es) – Date, place of the creation – An edited text based on the text witness(es) (DN) • An entry can be seen as a FRBR-expression for the texts found on the text witness(es) • Registers with aditional information
Collection 1 – modern transcripts • Follows “the new philology “ tradition – One text witness per transcript – Diplomatic transcript is in principle unedited. • Information to identify the physical text witness • Added information: word, lemma, part of speech, punctuation, syntactic information
The texts as TEI-XML documents • Diplomatarium Norvegicum and Regesta Norvegica – Each volume represents a complete printed work. – One TEI XML-file per volume • The transcripts: – Each transcript is a separate work – Separate TEI XML-files for each transcript – Metadata taken from the other sources • Persistent identifiers (URIs)?
Possible points for external links • RN/DN registers – Persons, places, onomastic information • Transcripts – Linguistic information • RN/DN entries – Creation date, place – Text witnesses, archival signature – Cross references for copies (vidimus) etc. – Published, mentioned, bibl. references
Information architecture in the documents • The real world /meta information is placed to the TEI header with pointers to the corresponding parts of the TEI document. • A XSLT-stylesheet extract the information from the header to a a set of RDF-triples which can be used in the Linked Data environment as in the Claros and WissKi projects.
Outcome • State of the art TEI XML encoded files of the Diplomatarium Norvegicum, Regesta Norvegica • The xml transcript files will have a richer metadata header with info from the other sources • A better interlinked web site for medieval documents pertaining Norway, free download of all the xml-texts • Open the material to other projects using Linked (Open) Data • Hopefully an invitation to other archives to at least discuss linking of their collections (especially relevant for the Nordic countries and Hanseatic archives)
Encoding of the Diplomatarium Norvegicum texts [...] <lb xml:id=“pb_dn_02_005_001”/> H[akon] konongr sun H[akonar] konongs sendir bondom oc <lb xml:id=“pb_dn_02_005_002”/> buþeignum. ollum guðs vinnum. oc sinum þeim er þetta bref sea eða [...] Encoding of the transcripts [...] <pb ed="ms"/><lb ed="ms" idref=“pb_dn_02_005_001"/><lb n="1" ed="DN"/> <w xml:id="w000001"><me:dipl>H<ex>akon</ex></me:dipl></w> <w xml:id="w000002"><me:dipl>kono<ex>n</ex>gR</me:dipl></w> <w xml:id="w000003"><me:dipl>sun</me:dipl></w> <me:punct>.</me:punct> <w xml:id="w000004"><me:dipl>H<ex>akonar</ex></me:dipl></w> <w xml:id="w000005"><me:dipl>k<ex>onongs</ex></me:dipl></w> [...]
Recommend
More recommend