an ontology for digital graphematjcs and philology
play

An ontology for digital graphematjcs and philology Die - PowerPoint PPT Presentation

Paolo Monella An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universitt Wuppertal (BUW), 6 February 2020 Outline Outline Interoperability of digital


  1. Paolo Monella An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universität Wuppertal (BUW), 6 February 2020

  2. Outline

  3. Outline ● Interoperability of digital scholarly editjons (DSEs) based on diplomatjc transcriptjons ● Digital modelling (ontology) of pre-modern writjng systems Graphemes / allographs – Allographs : – capitals, ligatures, positjonal variants, emphasis etc. ● In practjce : how can grapheme/allograph modelling make my DSE more interoperable? ● Open issues

  4. Interoperability: the issue

  5. Interoperability: the issue

  6. Interoperability: the issue ● uenenū

  7. Interoperability: the issue ● uenenū ● Historical documentation Diplomatic ● Visualization ● Processing ● (Erkenntnispotentiale)

  8. Interoperability: the issue ● uenenū

  9. Interoperability: the issue ● uenenū ● venenum

  10. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  11. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum venenum

  12. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  13. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  14. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  15. Interoperability: the issue ● My focus: European Medieval handwritjng ...and early print (imitatjng handwritjng) –

  16. Interoperability: current solutjons

  17. Unicode (TEI’s recommendatjon) ● Solutjon for new digital texts ● Not enough for pre-modern writjng systems Allographs – ſ (U+017F) / s (U+0073; ASCII 115) ● Have I encoded that they correspond to each other (variants of ● grapheme <s>)?

  18. Project-specifjc solutjons ● Disposable home-made solutjons ● Normalizatjon sofuware and strategies ● TEI: theory-agnostjc

  19. Interoperability through modelling

  20. Interoperability through modelling ● Scholarly discussion on modelling ● Documentjng project-specifjc modelling and normalizatjon practjces prose – formal (sofuware code, tables) – ● Shared models ● Reusable sofuware libraries

  21. An ontology for digital graphematjcs and philology

  22. Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph

  23. Ontology is_a Grapheme Linguistic Gr. Textual Gr. Punc- Meta- Space Logograph Intra-verbal Gr. tuation mark {+alphabetic} {-alphabetic} Diacritic Alphabetic Abbreviation Brevigraph Grapheme Mark

  24. Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph

  25. Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph

  26. Graphemes/allographs

  27. Graphemes/allographs: the commutatjon test System Comparatur vel ad se vel ad alium He is compared to himself or to another <s> <t> Text ● co̊paraƐur uł adſe uładalium <x> <y> <z>

  28. Graphemes/allographs: the commutatjon test System System <s> « τ » <t> Text ● co̊paraƐur uł adſe uładalium <x> «√» <y> <z>

  29. Graphemes/allographs: the commutatjon test <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”

  30. Graphemes/allographs: the commutatjon test Allographs Graphemes <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”

  31. Graphemes/allographs: the commutatjon test Gr Allogr t: τ | Ɛ | √ u: u | v z: z Allographs Graphemes <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”

  32. Graphemes / allographs: what to transcribe? ● What the project wants! based on its scientjfjc interests – (and on tjme / money) – ● But: framed in a larger model

  33. Can allographs have a distjnctjve value?

  34. Allographs τ τ τ τ τ Ɛ Ɛ √ √ √ √

  35. Allographs τ τ τ 1. « τ » τ τ Ɛ Ɛ 2. «Ɛ» √ √ √ 3.«√» √

  36. Capitals: allographs or graphemes? ● Cool (CA) is a cool town Geographical name ● Smith is a good smith Proper name ● ODD fjles are odd fjles Acronym OK for contemporary Western writing systems Not for classical/medieval handwriting (see later)

  37. Capitals: allographs or graphemes? ● Cool (CA) is a cool town Geographical name ● Smith is a good smith Proper name ● ODD fjles are odd fjles Acronym R. Mordenti F. Neuber P. Monella Grapheme Archi-grapheme Alphabeme <D> D D Allograph Allograph Grapheme Grapheme Grapheme Grapheme <d> <D> <d> <D> «d» «D»

  38. Sentence segmentatjon: distjnctjve value for meaning of the whole text ● I go because I have to. Stay here! I go because I have to stay here! Capitals

  39. Sentence segmentatjon: distjnctjve value for meaning of the whole text ● I go because I have to. Stay here! I go because I have to stay here! Punctuation Capitals

  40. Word segmentatjon: distjnctjve value for meaning of the whole text ● σαῦρος, ſucceſs, daſs (daß)

  41. Word segmentatjon: distjnctjve value for meaning of the whole text ● σαῦρος, ſucceſs, daſs (daß) Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me) Positional allograph

  42. Word segmentatjon: distjnctjve value for meaning of the whole text ● σαῦρος, ſucceſs, daſs (daß) Paulus suſtjnet me (Paolo holds me up) Paulus ſus tjnet me (Paolo the pig holds me) Space Positional allograph

  43. Connotators

  44. Connotators

  45. Connotators 𝖝𝖎𝖕 ≠ WHO Connotator Pertinence Connotator “Gothic” “Gaul” (marked) (not marked)

  46. Connotators Connotators, pertjnent for the writer ● graphemes as entjtjes Emphasis ● the Evangelist wrote Respect

  47. (Non-)pertjnent allographs: positjonal variants ● Ligatures ● Non-pertjnent for the writer Allographs ● Connotators, pertjnent for (some) readers « τ » editors, paleographers, – codicologists, historians studying «Ɛ» a MS / book (Beneventan vs Caroline script, – «√» print font, ſ / s)

  48. Distjnctjve value (pertjnence) of allographs? ● Graphemes change denotatjve meaning fame vs name – Hjelmslev: denotatjve semiotjcs – ● Allographs can have other forms of distjnctjve value (pertjnence) For the writer – ● 𝖝𝖎𝖕 vs WHO Hjelmslev: connotative semiotics ● For the reader (digital editor) – Digital editors can set their own pertinence (transcription) criteria ● based on their scientific interests – E.g.: fraktur font → political connotation in WW1 –

  49. In practjce: how can grapheme/allograph modelling make my DSE more interoperable?

  50. In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Manual (selective) OCR/HTT transcription (witness A) (witness B)

  51. In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Allographic Vn τ er <hi>dem</hi> unter dem ſchloſs transcription schloss Manual (selective) OCR/HTT transcription (witness A) (witness B)

  52. In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Allographic Vn τ er <hi>dem</hi> unter dem ſchloſs transcription schloss Manual (selective) OCR/HTT transcription (witness A) (witness B)

  53. In practjce: how can grapheme/allograph modelling make my DSE more interoperable? Unicode characters Allographic Vn τ er <hi>dem</hi> unter dem ſchloſs transcription schloss Manual (selective) OCR/HTT transcription (witness A) (witness B)

Recommend


More recommend