the proiel parallel corpus of old indo european new
play

The PROIEL parallel corpus of old Indo-European New Testament - PowerPoint PPT Presentation

The PROIEL corpora The Syntacticus interface Case studies The PROIEL parallel corpus of old Indo-European New Testament translations Dag Trygve Truslew Haug 23 February 2018 Dag Haug PROIEL 23 February 2018 1 / 30 The PROIEL corpora The


  1. The PROIEL corpora The Syntacticus interface Case studies The PROIEL parallel corpus of old Indo-European New Testament translations Dag Trygve Truslew Haug 23 February 2018 Dag Haug PROIEL 23 February 2018 1 / 30

  2. The PROIEL corpora The Syntacticus interface Case studies Introduction The PROIEL corpus: small, but deep Core: NT in Greek, Latin, Gothic, Classical Armenian and OCS What can the deep annotation do for us? Three sections: The annotation The Syntacticus interface Some case studies Dag Haug PROIEL 23 February 2018 2 / 30

  3. The PROIEL corpora The Syntacticus interface Case studies The background A corpus for linguists: focus on making the most of a limited data set for linguistic research Dag Haug PROIEL 23 February 2018 3 / 30

  4. The PROIEL corpora The Syntacticus interface Case studies The background A corpus for linguists: focus on making the most of a limited data set for linguistic research Pragmatic Resources in Old Indo-European Languages (PROIEL, 2008-2012) word order anaphoric expressions definiteness participles (background events) discourse particles Dag Haug PROIEL 23 February 2018 3 / 30

  5. The PROIEL corpora The Syntacticus interface Case studies The background A corpus for linguists: focus on making the most of a limited data set for linguistic research Pragmatic Resources in Old Indo-European Languages (PROIEL, 2008-2012) word order anaphoric expressions definiteness participles (background events) discourse particles The corpus should help this research, but also be useful for others Dag Haug PROIEL 23 February 2018 3 / 30

  6. The PROIEL corpora The Syntacticus interface Case studies The background A corpus for linguists: focus on making the most of a limited data set for linguistic research Pragmatic Resources in Old Indo-European Languages (PROIEL, 2008-2012) word order anaphoric expressions definiteness participles (background events) discourse particles The corpus should help this research, but also be useful for others Annotation continues (with less resources) Dag Haug PROIEL 23 February 2018 3 / 30

  7. The PROIEL corpora The Syntacticus interface Case studies Texts NT and translations Dag Haug PROIEL 23 February 2018 4 / 30

  8. The PROIEL corpora The Syntacticus interface Case studies Texts NT and translations Classical Greek and Latin: Herodotus Gallic War, Letters to Atticus, De officiis Post-classical Greek and Latin Sphrantzes’ Chronicles Peregrinatio Aetheriae Other corpora in the same format: Old Norse, Old Swedish, Medieval English and Romance, Old Russian and OCS Dag Haug PROIEL 23 February 2018 4 / 30

  9. The PROIEL corpora The Syntacticus interface Case studies The PROIEL annotation Many-layered annotation: Morphological annotation Syntactic annotation (dependency/LFG-based) Semantic and other customised annotation (e.g. animacy) Annotation of information structure and anaphoric links Experimental discourse structure annotation Token alignments Dag Haug PROIEL 23 February 2018 5 / 30

  10. The PROIEL corpora The Syntacticus interface Case studies Morphology All our languages have relatively rich morphology Dag Haug PROIEL 23 February 2018 6 / 30

  11. The PROIEL corpora The Syntacticus interface Case studies Morphology All our languages have relatively rich morphology inflection mood tense voice degree case person number gender strength Dag Haug PROIEL 23 February 2018 6 / 30

  12. The PROIEL corpora The Syntacticus interface Case studies Morphology All our languages have relatively rich morphology inflection mood tense voice degree case person number gender strength 2623 unique MSD tags in the corpus 803 unique tags in Greek, 636 in Latin In addition, 26 POS tags Also derivational morphology for some categories Example Dag Haug PROIEL 23 February 2018 6 / 30

  13. The PROIEL corpora The Syntacticus interface Case studies Dependency syntax Dependencies are asymmetric relations between words We label these dependencies with the function of the dependent The dependencies form a tree under an abstract root No explicit constituency example Dag Haug PROIEL 23 February 2018 7 / 30

  14. The PROIEL corpora The Syntacticus interface Case studies Semantic annotation – animacy HUMAN ORG ANIMAL VEH CONC PLACE NONCONC TIME Dag Haug PROIEL 23 February 2018 8 / 30

  15. The PROIEL corpora The Syntacticus interface Case studies Semantic annotation – animacy HUMAN All Greek noun lemmata ORG annotated for animacy ANIMAL VEH CONC PLACE NONCONC TIME Dag Haug PROIEL 23 February 2018 8 / 30

  16. The PROIEL corpora The Syntacticus interface Case studies Semantic annotation – animacy HUMAN All Greek noun lemmata ORG annotated for animacy ANIMAL Adjustments at token level VEH CONC PLACE NONCONC TIME Dag Haug PROIEL 23 February 2018 8 / 30

  17. The PROIEL corpora The Syntacticus interface Case studies Semantic annotation – animacy HUMAN All Greek noun lemmata ORG annotated for animacy ANIMAL Adjustments at token level VEH Tag transfer to other parts of CONC speech via anaphoric links PLACE NONCONC TIME Dag Haug PROIEL 23 February 2018 8 / 30

  18. The PROIEL corpora The Syntacticus interface Case studies Semantic annotation – animacy HUMAN All Greek noun lemmata ORG annotated for animacy ANIMAL Adjustments at token level VEH Tag transfer to other parts of CONC speech via anaphoric links PLACE Tag transfer to other languages NONCONC via token alignments TIME Dag Haug PROIEL 23 February 2018 8 / 30

  19. The PROIEL corpora The Syntacticus interface Case studies Givenness Givenness tags based on which context the hearer uses to establish reference Discourse (anaphora) → OLD Dag Haug PROIEL 23 February 2018 9 / 30

  20. The PROIEL corpora The Syntacticus interface Case studies Givenness Givenness tags based on which context the hearer uses to establish reference Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Dag Haug PROIEL 23 February 2018 9 / 30

  21. The PROIEL corpora The Syntacticus interface Case studies Givenness Givenness tags based on which context the hearer uses to establish reference Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf Dag Haug PROIEL 23 February 2018 9 / 30

  22. The PROIEL corpora The Syntacticus interface Case studies Givenness Givenness tags based on which context the hearer uses to establish reference Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf Encyclopedic knowledge → ACC-gen Dag Haug PROIEL 23 February 2018 9 / 30

  23. The PROIEL corpora The Syntacticus interface Case studies Givenness Givenness tags based on which context the hearer uses to establish reference Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf Encyclopedic knowledge → ACC-gen No context (no extra-NP information) → NEW Dag Haug PROIEL 23 February 2018 9 / 30

  24. The PROIEL corpora The Syntacticus interface Case studies Givenness Givenness tags based on which context the hearer uses to establish reference Discourse (anaphora) → OLD Situation (deixis) → ACC-sit Scenarios (inferences) → ACC-inf Encyclopedic knowledge → ACC-gen No context (no extra-NP information) → NEW Exists for 58756 NPs (full coverage of the Greek gospels + various other texts) example Dag Haug PROIEL 23 February 2018 9 / 30

  25. The PROIEL corpora The Syntacticus interface Case studies http://syntacticus.org Dag Haug PROIEL 23 February 2018 10 / 30

  26. The PROIEL corpora The Syntacticus interface Case studies Case studies Select case studies to show the value of deep analysis OCS aspect Latin participles Early Slavic DOM A few words about the danger of superficial analyses of Biblical data Dag Haug PROIEL 23 February 2018 11 / 30

  27. The PROIEL corpora The Syntacticus interface Case studies Patterns PSNVNRNSNCVRSNRNCDVRSNVVSNCSNDNVRPCNRSNPVSNP SARPVCDSNPVRSNCVRSNMNVRSNCVRSNCSNVPRDSVSNVSN RSNVSNSNGVSNCVSNSNVCVRSNCVRSNSNVNCNSNNVRSNVD NCVPSNIRPCVPVNNCDVSNVPCVAVNSSNCNSNPCPRSNVSNC DVPCVSNPNRSNRSNVRPCVRNCDSNVRSNCVRSNPVDVPDNVC DDSNCDVRSNPNRNACVVPPCPNNVVPVPPVSASNCVPSNVCVR PCVPSNSACVNAVRPCVAGVPVPVPNARNDSNSAVCVPCVSNPD DRASNSNCDRSNVVRSNNCNRNCNSDNNVVCDVPRPCVVPVSNC VPSNCVPADVGVSNVRPPSDVCSVCVASNVRSNCVADVANCNAV CDVVSNGVPCDADVVCVRANCDVCVPNCSRPCVPCVPGPVPCVP VDRSVNGDDVRPDVCVVRSNPRASNCSNVCVRPAVPCVVPGGVV PVCVVSNPVCVVVCDVRPSNCVCVPDVPCVPVPPVCVPVSNCVR SNPPVNRNPPDVVVACVSNGDPVRNDVCDRANVCVRPD Dag Haug PROIEL 23 February 2018 12 / 30

Recommend


More recommend