morphological analysis morphological analysis and
play

Morphological Analysis Morphological Analysis and Generation for - PowerPoint PPT Presentation

Morphological Analysis Morphological Analysis and Generation for Pali and Generation for Pali David Alfter Jrgen Knauth 18 September 2015 @daalft @daalft Pali Pali Pali Pali (Dead) Indo-aryan language Fusional language Rich


  1. Word Class Guesser: Lemma Word Class Guesser: Lemma Code Excerpt if (ends(lemma, "a", " ā ", "i", " ī ", "u", " ū ", "ant", "v ā ", "m ā ", "at")) { guesses.add("adjective"); } if (ends(lemma, "a", "i", "a ṃ ", "ma", "ya")) { guesses.add("numeral"); } if (ends(lemma, "u ṃ ")) { guesses.add("indeclinable"); }

  2. Results Results Accuracy Nouns-Adjectives 99.96% Pronouns 88.57% Numerals 76.62% Verbs 63.37%

  3. Sandhi Sandhi

  4. Compound Sandhi Compound Sandhi

  5. Intuition Intuition Identify possible sandhi loci Split into n words such that ∀ n : w ∈ D n

  6. Problems Problems Requires extensive Dictionary Requires extensive Dictionary More than one analysis possible More than one analysis possible Not a compound Not a compound

  7. External Sandhi External Sandhi

  8. Corpus-based resolution Corpus-based resolution Sandhi-inducing words Sandhi-inducing words ca (and) hi (because) pi (also)

  9. Hand-written rules Hand-written rules Regular Expressions

  10. Replacement rules \bpañca\b X ñca\b ṃ ca X pañca ñhi\b ṃ hi ñpi\b ṃ pi

  11. Replacement rules \bpañca\b X ñca\b ṃ ca X pañca ñhi\b ṃ hi ñpi\b ṃ pi

  12. Internal Sandhi Internal Sandhi

  13. Internal Sandhi Internal Sandhi

  14. Conclusion Conclusion

  15. Paradigms for Paradigms for Generation and Generation and Analysis Analysis

  16. Dictionary Integration Dictionary Integration for additional for additional information information

  17. Rule-based and Rule-based and heuristic backup heuristic backup

  18. RegEx-based External RegEx-based External Sandhi Resolution Sandhi Resolution

  19. Lookup Lookup

  20. Server Architecture Server Architecture

  21. Well documented REST API Well documented REST API Easy integration Easy integration

  22. Data Processing Data Processing

  23. Extract structured data Extract structured data from unstructured data from unstructured data

  24. [n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjh ā tar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. ° ā tar).

  25. [n. ag. fr. abhijjhita in med. function] one who covets M <smallcaps>i.</smallcaps> 287 (T. abhijjh ā tar, v. l. °itar) = A <smallcaps>v.</smallcaps> 265 (T. °itar, v. l. ° ā tar).

  26. Pacati , [Ved . pacati , Idg . *peq ǔō , Av . pac- ; Obulg . peka to fry , roast , Lith , kep ū bake , Gr . p έ ssw cook , p έ pwn ripe] to cook , boil , roast Vin . IV , 264 ; fig . torment in purgatory ( trs . and intrs .): Niraye pacitv ā after roasting in N . S . II , 225 , PvA . 10 , 14 . -- ppr . pacanto tormenting , Gen . pacato ( +Caus . p ā cayato ) D . I , 52 ( expld at DA . I , 159 , where read pacato for paccato , by pare da ṇḍ ena p īḷ entassa ). -- pp . pakka ( q . v .). ‹-› Caus . pac ā peti & p ā ceti ( q . v .). -- Pass . paccati to be roasted or tormented ( q . v .).( Page 382 )

  27. Manual annotation Manual annotation

  28. Open Problems Open Problems

  29. Verbs Verbs

  30. Use verb form table Use verb form table Attested forms only

  31. Internal Sandhi Internal Sandhi

  32. Illustrating Calculation Illustrating Calculation Splitting Internal Sandhi

  33. "When two vowels meet, one may be elided." When two vowels meet: elide first vowel elide second vowel no elision

  34. 8 vowels n-vowel-word N = (1 + (2 ∗ 8)) n n = 1 → N = 17 n = 2 → N = 289 n = 3 → N = 4913

  35. "A final dental is assimilated to "A final dental is assimilated to the following consonant" the following consonant"

  36. "A final dental is assimilated to "A final dental is assimilated to the following consonant" the following consonant" (DENTAL) (CONSONANT) : duplicate($2)

  37. kk: t k kk: th k kk: d k kk: dh k kk: n k kk: l k kk: s k ... 224 possibilities

  38. Sandhi merge rules 151 rules

Recommend


More recommend