efficient parsing with a large scale unification based
play

Efficient parsing with a large-scale unification-based grammar - PowerPoint PPT Presentation

0. Efficient parsing with a large-scale unification-based grammar Lessons from a multi-year, multi-team endeavour Liviu Ciortuz Department of Computer Science University of Iasi, Romania ALEAR Workshop, FP7 E.U. Project Humbold


  1. 0. Efficient parsing with a large-scale unification-based grammar Lessons from a multi-year, multi-team endeavour Liviu Ciortuz Department of Computer Science University of Iasi, Romania “ALEAR” Workshop, FP7 E.U. Project Humbold Universit¨ at, Berlin, Germany November 2008

  2. 1. PLAN • Fore-ground: LinGO, the large-scale HPSG for English Key efficciency issues in parsing with large-scale unification grammars • Back-ground: Unification-based grammars in the small OSF- and OSF-theory unification FS expansion Compilation of OSF- and OSF-theory unification LIGHT: The language and the system Two classes of feature paths: QC and GR

  3. 2. 1. Fore-ground: Based on “Collaborative Language Engineering” St. Oepen, D. Flickiger J. Tsujii, H. Uszko- reit (eds.), Center for Studies of Language and Information, Stanford, 2002 L. Ciortuz. “ LIGHT – a constraint language and compiler system for typed-unification grammars.” In LNAI vol. 2479, M. Jarke, J. K¨ ohler, G. Lakemeyer (eds.), Springer- Verlag, 2002, pp. 3–17. L. Ciortuz. On two classes of feature paths in large-scale unification grammars. In New Developments in Parsing Technolo- gies , Hurry Bunt, Giorgio Satta, John Car- roll (eds.), Kluwer Academic Publishers, 2004, pp. 203–227.

  4. 3. 1.1. LinGO – the English Resource Grammar EUBP version, www.delph-in.net Short description: from “Efficiency in Unification-Based Parsing”, Natural Language Engineering, special issue, 6(1), 2000 • Support theory: HPSG — Head-driven Phrase Structure Grammar [Pollard and Sag, 1987, 1994] • Size: un-expanded: 2.47MB, expanded: 40.34MB; 15059 types, 62 rules, 6897 lexical extries  TDL / PAGE, [Kiefer, 1994], DFKI   Type Description Language    • Developed within: LKB, [Copestake, 1999], CSLI Stanford    Linguistic Knowledge Base   Applications: machine translation of spoken and edited language, email auto response, consumer opinion tracking, question answering

  5. 4. Systems running LinGO ERG DFKI Saarbruecken TDL / PAGE Parsing (Control) Stanford Univ. LKB interpreter DFKI Saarbruecken PET FS Unifier Haiffa Univ. [AMALIA] (Logic) ALE Tokyo Univ. LiLFeS compiler / AM DFKI Saarbruecken OSF LIGHT

  6. 5. Some comparisons on performances in processing LinGO reported by [Oepen, Callmeier, 2000] version year test suite av. parsing time (sec.) space (Kb) ‘tsnlp’ 3.69 19016 TDL / PAGE 1996 ‘aged’ 2.16 79093 ‘tsnlp’ 0.03 333 PET 2000 ‘aged’ 0.14 1435

  7. 6. Performances of LIGHT w.r.t. other systems processing LinGO system optimization average parsing time on CSLI test-suite (sec./sentence) LIGHT quick-check 0.04 PET quick-check 0.04 LiLFeS CFG filter 0.06 LIGHT without quick-check 0.07 PET without quick-check 0.11

  8. 7. 1.2 Key efficiency issues in parsing with large-scale (LinGO-like) unification-based grammars (I) • choosing the right logical framework, and making your grammar a logical, declarative grammar • grammar expansion: full vs. partial expansion • sort lattice encoding • FS unification: compilation • FS sharing • lexicon pre-compilation

  9. 8. Key efficiency issues in parsing with large-scale (LinGO-like) unification-based grammars (II) • exploring grammar particularities: quick check (QC) pre-unification filtering (generalised) grammar reduction (GR) • two-step parsing ◦ hyper-active parsing ◦ ambiguity packing (based on FS subsumption) ◦ grammar approximation: CFGs

  10. 9. 2. Back-ground: PLAN 2.1 Unification-based grammars in the small 2.2 The Logics of feature structures 2.2.1 OSF notions 2.2.2 OSF- and OSF-theory unification 2.2.3 The osf unify function 2.2.4 The type-consistent OSF unifier 2.2.5 Feature Structure expansion 2.3 Compiled OSF-unification 2.4 Compiled OSF-theory unification 2.5 LIGHT : the language and the system 2.6 Two classes of feature paths in unification grammars: quick ckeck (QC) paths, and generalised reduction (GR) paths

  11. 10. 2.1 Unification-based grammars in the small Two sample feature structures OSF notation vp satisfy_HPSG_principles [ ARGS < verb [ CAT #1, [ HEAD #1, SUBCAT #2, OBJECT #3:np, HEAD top SUBJECT #2:sign ], [ CAT #1, #3 >, SUBCAT #3|#2 ], HEAD #1, COMP top SUBJECT #2 ] [ CAT #3, SUBCAT nil ] ]

  12. 11. HPSG principles as feature constraints • head principle: satisfy HPSG principles [ head.cat = cat ] • saturation principle: satisfy HPSG principles [ comp.subcat = nil ] • subcategorization principle: satisfy HPSG principles [ head.subcat = comp.cat | subcat ]

  13. 12. A sample top sort hierarchy start string phrase_or_word categ diff_list list categ_list cons det noun adjective verb phrase categ_cons nil word satisfy_HPSG_principles det_le noun_le adjective_le verb_le pnoun_le lh_phrase rh_phrase iverb_le tverb_le cnoun_le is nice kisses the mary girl thinkslaughs embarrasses embarrassed john meets pretty met kissed

  14. 13. An expaned feature structure... rewritten as a rule lh_phrase lh_phrase [ PHON list, [ PHON list, CAT #1:categ, CAT #1:categ, SUBCAT #2:categ_list, SUBCAT #2:categ_list, HEAD #4, HEAD #4:phrase_or_word COMP #5 ] [ PHON list, <- CAT #1, SUBCAT #3|#2 ], #4:phrase_or_word [ PHON list, COMP #5:phrase_or_word CAT #1, [ PHON list, SUBCAT #3|#2 ], CAT #3, #5:phrase_or_word SUBCAT nil ], ARGS <#4, #5> ] [ PHON list, CAT #3, SUBCAT nil ].

  15. 14. Tree representation lh_phrase of a feature structure PHON COMP CAT HEAD ARGS SUBCAT diff_list #1 #4 list #2 #5 FIRST REST #4 list phrase_or_word REST FIRST PHON SUBCAT CAT #1 diff_list nil list #5 categ phrase_or_word REST FIRST PHON SUBCAT CAT #3 #2 diff_list #3 nil

  16. A simple typed-unification HPSG-like grammar types: start[ SUBCAT nil ] cons [ FIRST top, REST list ] diff_list [ FIRST_LIST list, program: // rules REST_LIST list ] categ_cons lh_phrase [ FIRST categ, [ HEAD #1, REST categ_list ] COMP #2, phrase_or_word ARGS <#1,#2> ] [ PHON list, rh_phrase CAT categ, [ HEAD #1, SUBCAT categ_list ] COMP #2, phrase ARGS <#2,#1> ] [ HEAD #1:phrase_or_word, COMP #2:phrase_or_word, query: // lexical entries ARGS cons ] satisfy_HPSG_principles the[ PHON <"the"> ] [ CAT #1, girl[ PHON <"girl"> ] SUBCAT #2, john[ PHON <"john"> ] HEAD top mary[ PHON <"mary"> ] [ CAT #1, nice[ PHON <"nice"> ] SUBCAT #3|#2 ], embarrassed[ PHON <"embarrassed"> ] COMP top pretty[ PHON <"pretty"> ] [ CAT #3, met[ PHON <"met"> ] SUBCAT nil ] ] kissed[ PHON <"kissed"> ] det_le is[ PHON <"is">, [ CAT det, CAT verb, SUBCAT nil ] SUBCAT <adjective, noun> ] noun_le laughs[ PHON <"laughs"> ] [ CAT noun ] kisses[ PHON <"kisses"> ] pnoun_le thinks[ PHON <"thinks">, [ SUBCAT nil ] CAT verb, cnoun_le SUBCAT <verb, noun> ] [ SUBCAT <det> ] meets[ PHON <"meets"> ] adjective_le embarrasses[ PHON <"embarrasses"> ] [ CAT adjective, SUBCAT nil ] iverb_le [ CAT verb, SUBCAT <noun> ] tverb_le [ CAT verb, SUBCAT <noun, noun> ]

  17. A simple typed-unification grammar sorts: sign:top. rule:sign. np:rule. vp:rule. query: // lexical entries s:rule. lex_entry:sign. the det:lex_entry. [ HEAD top noun:lex_entry. [ TRANS top verb:lex_entry. [ DETNESS + ] ], the:det. PHON < "the" > ] a:det. a cat:noun. [ HEAD top mouse:noun. [ TRANS top catches:verb. [ DETNESS - ] ], PHON < "a" > ] types: cat [ HEAD top 3sing [ AGR 3sing, [ NR sing, TRANS top PERS third ] [ PRED cat ] ], PHON < "cat" > ] program: // rules mouse [ HEAD top np [ AGR 3sing, [ ARGS < det TRANS top [ HEAD top [ PRED mouse ] ], [ TRANS #1 ] ], PHON < "mouse" > ] noun catches [ HEAD #2:top [ HEAD top [ TRANS #1 ], [ AGR #2:3sing, KEY-ARG + ] >, TENSE present, HEAD #2 ] TRANS top vp [ ARG1 #3, [ ARGS < verb ARG2 #1, [ HEAD #1, PRED catches ] ], OBJECT #3:np, OBJECT sign SUBJECT #2:np, [ HEAD top KEY-ARG + ], [ TRANS #1 ] ], #3 >, PHON < "catches" >, HEAD #1, SUBJECT sign SUBJECT #2 ] [ HEAD top s [ AGR #2, [ ARGS < #2:np, TRANS #3 ] ] ] vp [ HEAD #1, SUBJECT #2, KEY-ARG + ] >, HEAD #1 ] The context-free backbone of the above grammar np → det ∗ noun det ⇒ the | a vp → ∗ verb np noun ⇒ cat | mouse s → np ∗ vp verb ⇒ catches

  18. 17. Parsing The cat catches a mouse 12 DC 7 11 6 KC 10 RC 5 6 9 DC DC 4 1 5 8 7 0 3 KC 2 KC KC 4 3 1 0 2 the cat catches a mouse 0 1 2 3 4 5

  19. 18. syn. rule / lex. categ. start - end env s → .np vp. 0 − 5 12 7 s → np .vp. 2 − 5 11 6 10 vp → .verb np. 2 − 5 5 9 np → .det noun. 3 − 5 4 8 np → det .noun. 4 − 5 3 The final content of the 7 vp → .verb. np 2 − 3 2 chart when parsing np → .det noun. 0 − 2 6 1 The cat catches a mouse np → det .noun. 1 − 2 5 0 det ⇒ the 0 − 1 4 3 noun ⇒ cat 1 − 2 2 verb ⇒ catches 2 − 3 1 det ⇒ a 3 − 4 noun ⇒ mouse 4 − 5 0

Recommend


More recommend