Paolo Monella An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universität Wuppertal (BUW), 6 February 2020
Outline
Outline ● Interoperability of digital scholarly editjons (DSEs) based on diplomatjc transcriptjons ● Digital modelling (ontology) of pre-modern writjng systems Graphemes / allographs – Allographs : – capitals, ligatures, positjonal variants, emphasis etc. ● In practjce : how can grapheme/allograph modelling make my DSE more interoperable? ● Open issues
Interoperability: the issue
Interoperability: the issue
Interoperability: the issue ● uenenū
Interoperability: the issue ● uenenū ● Historical documentation Diplomatic ● Visualization ● Processing ● (Erkenntnispotentiale)
Interoperability: the issue ● uenenū
Interoperability: the issue ● uenenū ● venenum
Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum
Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum venenum
Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum
Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum
Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum
Interoperability: the issue ● My focus: European Medieval handwritjng ...and early print (imitatjng handwritjng) –
Interoperability: current solutjons
Unicode (TEI’s recommendatjon) ● Solutjon for new digital texts ● Not enough for pre-modern writjng systems Allographs – ſ (U+017F) / s (U+0073; ASCII 115) ● Have I encoded that they correspond to each other (variants of ● grapheme <s>)?
Project-specifjc solutjons ● Disposable home-made solutjons ● Normalizatjon sofuware and strategies ● TEI: theory-agnostjc
Interoperability through modelling
Interoperability through modelling ● Scholarly discussion on modelling ● Documentjng project-specifjc modelling and normalizatjon practjces prose – formal (sofuware code, tables) – ● Shared models ● Reusable sofuware libraries
An ontology for digital graphematjcs and philology
Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph
Ontology is_a Grapheme Linguistic Gr. Textual Gr. Punc- Meta- Space Logograph Intra-verbal Gr. tuation mark {+alphabetic} {-alphabetic} Diacritic Alphabetic Abbreviation Brevigraph Grapheme Mark
Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph
Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph
Graphemes/allographs
Graphemes/allographs: the commutatjon test System Comparatur vel ad se vel ad alium He is compared to himself or to another <s> <t> Text ● co̊paraƐur uł adſe uładalium <x> <y> <z>
Graphemes/allographs: the commutatjon test System System <s> « τ » <t> Text ● co̊paraƐur uł adſe uładalium <x> «√» <y> <z>
Graphemes/allographs: the commutatjon test <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”
Graphemes/allographs: the commutatjon test Allographs Graphemes <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”
Graphemes/allographs: the commutatjon test Gr Allogr t: τ | Ɛ | √ u: u | v z: z Allographs Graphemes <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”
Graphemes / allographs: what to transcribe? ● What the project wants! based on its scientjfjc interests – (and on tjme / money) – ● But: framed in a larger model
Can allographs have a distjnctjve value?
Allographs τ τ τ τ τ Ɛ Ɛ √ √ √ √
Allographs τ τ τ 1. « τ » τ τ Ɛ Ɛ 2. «Ɛ» √ √ √ 3.«√» √
Capitals: allographs or graphemes? ● Cool (CA) is a cool town Geographical name ● Smith is a good smith Proper name ● ODD fjles are odd fjles Acronym OK for contemporary Western writing systems Not for classical/medieval handwriting (see later)
Capitals: allographs or graphemes? ● Cool (CA) is a cool town Geographical name ● Smith is a good smith Proper name ● ODD fjles are odd fjles Acronym R. Mordenti F. Neuber P. Monella Grapheme Archi-grapheme Alphabeme <D> D D Allograph Allograph Grapheme Grapheme Grapheme Grapheme <d> <D> <d> <D> «d» «D»
Sentence segmentatjon: distjnctjve value for meaning of the whole text ● I go because I have to. Stay here! I go because I have to stay here! Capitals
Sentence segmentatjon: distjnctjve value for meaning of the whole text ● I go because I have to. Stay here! I go because I have to stay here! Punctuation Capitals
Word segmentatjon: distjnctjve value for meaning of the whole text ● σαῦρος, ſucceſs, daſs (daß)
Word segmentatjon: distjnctjve value for meaning of the whole text ● σαῦρος, ſucceſs, daſs (daß) Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me) Positional allograph
Word segmentatjon: distjnctjve value for meaning of the whole text ● σαῦρος, ſucceſs, daſs (daß) Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me) Space Positional allograph
Connotators
Connotators
Connotators 𝖝𝖎𝖕 ≠ WHO Connotator Pertinence Connotator “Gothic” “Gaul” (marked) (not marked)
Connotators Connotators, pertjnent for the writer ● graphemes as entjtjes Emphasis ● the Evangelist wrote Respect
(Non-)pertjnent allographs: positjonal variants ● Ligatures ● Non-pertjnent for the writer Allographs ● Connotators, pertjnent for (some) readers « τ » editors, paleographers, – codicologists, historians studying «Ɛ» a MS / book (Beneventan vs Caroline script, – «√» print font, ſ / s)
Distjnctjve value (pertjnence) of allographs? ● Graphemes change denotatjve meaning fame vs name – Hjelmslev: denotatjve semiotjcs – ● Allographs can have other forms of distjnctjve value (pertjnence) For the writer – ● 𝖝𝖎𝖕 vs WHO Hjelmslev: connotative semiotics ● For the reader (digital editor) – Digital editors can set their own pertinence (transcription) criteria ● based on their scientific interests – E.g.: fraktur font → political connotation in WW1 –
In practjce: how can grapheme/allograph modelling make my DSE more interoperable?
In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Manual (selective) OCR/HTT transcription (witness A) (witness B)
In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Allographic Vn τ er <hi>dem</hi> unter dem ſchloſs transcription schloss Manual (selective) OCR/HTT transcription (witness A) (witness B)
In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Allographic Vn τ er <hi>dem</hi> unter dem ſchloſs transcription schloss Manual (selective) OCR/HTT transcription (witness A) (witness B)
In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Unicode characters Allographic Vn τ er <hi>dem</hi> unter dem ſchloſs transcription schloss Manual (selective) OCR/HTT transcription (witness A) (witness B)
Recommend
More recommend