an ontology for digital graphematjcs and philology
play

An ontology for digital graphematjcs and philology Die - PowerPoint PPT Presentation

Paolo Monella An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universitt Wuppertal (BUW), 6 February 2020 Outline Outline Interoperability of digital


  1. Paolo Monella An ontology for digital graphematjcs and philology Die (hyper-)diplomatjsche Transkriptjon und ihre Erkenntnispotentjale Bergische Universität Wuppertal (BUW), 6 February 2020

  2. Outline

  3. Outline ● Interoperability of digital scholarly editjons (DSEs) based on diplomatjc transcriptjons ● Digital modelling (ontology) of pre-modern writjng systems Graphemes / allographs – Allographs : – capitals, ligatures, positjonal variants, emphasis etc. ● In practjce : how can grapheme/allograph modelling make my DSE more interoperable? ● Open issues

  4. Interoperability: the issue

  5. Interoperability: the issue

  6. Interoperability: the issue ● uenenū

  7. Interoperability: the issue ● uenenū ● Historical documentation Diplomatic ● Visualization ● Processing ● (Erkenntnispotentiale)

  8. Interoperability: the issue ● uenenū

  9. Interoperability: the issue ● uenenū ● venenum

  10. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  11. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum venenum

  12. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  13. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  14. Interoperability: the issue ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (dist. reading) ● uenenū ● venenum

  15. Interoperability: the issue ● My focus: European Medieval handwritjng ...and early print (imitatjng handwritjng) –

  16. Interoperability: the issue ● My focus: European Medieval handwritjng ...and early print (imitatjng handwritjng) – Pre-Gutenberg (and shortly afuer) – ● Alphabetjc writjng systems (so far) Latjn script (Italian, English...), Greek, Cyrillic... – No non-alphabetjc (Cuneiform, Arabic, Chinese etc.) –

  17. Interoperability: current solutjons

  18. Unicode (TEI’s recommendatjon) ● Solutjon for new digital texts ● Not enough for pre-modern writjng systems Allographs – ſ (U+017F) / s (U+0073; ASCII 115) ● Have I encoded that they correspond to each other (variants of ● grapheme <s>)?

  19. Unicode (TEI’s recommendatjon) ● Solutjon for new digital texts ● Not enough for pre-modern writjng systems Allographs – ſ (U+017F) / s (U+0073; ASCII 115) ● Have I encoded that they correspond to each other (variants of ● grapheme <s>)? Ligatures – & (U+0026; ASCII 38) ● Have I encoded that it is equivalent to “e + t” in that MS? ● Grapheme set – u (U+0075; ASCII 117) ● Have I encoded whether it “covers” (or not) <u> and <v>? ●

  20. Diplomatjc/normalized: the surrender? ● venenum Normalized ● Processing ● Search ● Collation ● NLP (lemma, PoS etc.) ● Statistics (distant reading)... ● uenenū ● Historical documentation Diplomatic ● Historical documentation ● Visualization ● Visualization ● Processing ● (Erkenntnispotentiale)

  21. Project-specifjc solutjons ● Disposable home-made solutjons ● Normalizatjon sofuware and strategies ● TEI: theory-agnostjc

  22. Interoperability through modelling

  23. Interoperability through modelling ● Scholarly discussion on modelling ● Documentjng project-specifjc modelling and normalizatjon practjces prose – formal (sofuware code, tables) – ● Shared models ● Reusable sofuware libraries

  24. An ontology for digital graphematjcs and philology

  25. Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph

  26. Ontology is_a Grapheme Linguistic Gr. Textual Gr. Punc- Meta- Space Logograph Intra-verbal Gr. tuation mark {+alphabetic} {-alphabetic} Diacritic Alphabetic Abbreviation Brevigraph Grapheme Mark

  27. Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph

  28. Ontology Lemma Token (inflected word) Alphabeme Logograph ● Alphabetic Grapheme ● Abbreviation Mark ● is_a + Brevigraph Legature ● ● Grapheme Diacritic Abbreviation ● ● Space ● Punctuation ● Metamark ● Allograph

  29. Digital modelling for pre-modern writjng systems

  30. Digital modelling

  31. Digital modelling ● Comparatur vel ad se vel ad alium He is compared to himself or to another ● co̊paraƐur uł adſe uładalium

  32. Digital modelling ● Comparatur vel ad se vel ad alium He is compared to himself or to another ● co̊paraƐur uł adſe uładalium

  33. Digital modelling ● Comparatur vel ad se vel ad alium He is compared to himself or to another ● co̊paraƐur uł adſe uładalium Digital modelling

  34. Digital modelling ● co̊paraƐur uł adſe uładalium

  35. A structural approach to digital modelling System <s> <t> Text ● co̊paraƐur uł adſe uładalium <x> <y> Entities <z> Analysis

  36. A structural approach to digital modelling System <s> <t> Text ● co̊paraƐur uł adſe uładalium <x> <y> Entities <z> Digital modelling Analysis

  37. Graphemes/allographs

  38. Graphemes/allographs: the commutatjon test System Comparatur vel ad se vel ad alium He is compared to himself or to another <s> <t> Text ● co̊paraƐur uł adſe uładalium <x> <y> <z>

  39. Graphemes/allographs: the commutatjon test System System <s> « τ » <t> Text ● co̊paraƐur uł adſe uładalium <x> «√» <y> <z>

  40. Graphemes/allographs: the commutatjon test <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”

  41. Graphemes/allographs: the commutatjon test Allographs Graphemes <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”

  42. Graphemes/allographs: the commutatjon test Gr Allogr t: τ | Ɛ | √ u: u | v z: z Allographs Graphemes <s> « τ » <t> ● co̊paraƐur uł adſe uładalium <x> «√» Commutation : Substitution : <y> → Change → No change in “denotative in “denotative <z> meaning” meaning”

  43. Graphemes / allographs: what to transcribe? ● What the project wants! based on its scientjfjc interests – (and on tjme / money) – ● But: framed in a larger model

  44. Saussure, pertjnence and the scribe’s toolbox

  45. Saussure, pertjnence and the scribe’s toolbox MS A a b c d e f g h i l m n o p q r s t u z · ; MS B a b c d e f g h i j l m n o p q r s t u v z . , ; : !

  46. Saussure, pertjnence and the scribe’s toolbox OCR from Teubner a b c d e f g h i l m n o p q r s t u z · ; OCR from Loeb a b c d e f g h i j l m n o p q r s t u v z . , ; : !

  47. Saussure, pertjnence and the scribe’s toolbox ● The toolbox of the scribe Defjnitjon of graphemes, allographs… – ● Writjng systems as autonomous semiotjc systems (Sampson) Not as epiphenomena of oral language (phonemes) – Mandarin / cantonese – “Opaque” orthographies (English) – “knight”, “aile”, “read”, “read” (past tense) ● Medieval MSS: pronunciatjon? – a b c d e f g h i j l m n o p q r s t u v z . , ; : !

  48. Saussure, pertjnence and the scribe’s toolbox ● “In language there are only difgerences” (Saussure) “But the statement that everything in language is negatjve is true – only if the signifjed and the signifjer are considered separately ; when we consider the sign in its totality, we have something that is positjve in its own class” a b c d e f g h i j l m n o p q r s t u v z . , ; : !

  49. Saussure, pertjnence and the scribe’s toolbox ● Can we defjne the scribe’s (graphematjc, signifjer) toolbox under complete ignorance of the linguistjc (meaning, signifjed) dimension? a b c d e f g h i j l m n o p q r s t u v z . , ; : !

  50. Saussure, pertjnence and the scribe’s toolbox ● Can we defjne the scribe’s toolbox under complete ignorance of the linguistjc dimension? a b c d e f g h i j l m n o p q r s t u v z . , ; : !

Recommend


More recommend