overview
play

Overview Introduction Lexicalized TAG, Advantages of parsing with - PowerPoint PPT Presentation

Parsing with Lexicalized TAG (1) Extracting and comparing LTAG (2) Presentation by Philip John Gorinski Seminar Recent Advances in Parsing Technology Saarland University, Winter Term 2011/12 (1) Yves Shabes, Aravind K. Joshi, 1990 (2)


  1. Parsing with Lexicalized TAG (1) Extracting and comparing LTAG (2) Presentation by Philip John Gorinski Seminar “Recent Advances in Parsing Technology” Saarland University, Winter Term 2011/12 (1) Yves Shabes, Aravind K. Joshi, 1990 (2) Fei Xia, Chung-hye Han, Martha Palmer, and Aravind Joshi, 2001

  2. Overview ● Introduction ● Lexicalized TAG, Advantages of parsing with LTAG ● Parsing LTAGs ● bottom-up ● top-down ● bottom-up + dynamic top-down ● Extracting and Comparing LTAG ● Data ● Extraction ● Language comparison using LTAGs ● Conclusion

  3. Overview ● Introduction ● Lexicalized TAG, Advantages of parsing with LTAG ● Parsing LTAGs ● bottom-up ● top-down ● bottom-up + dynamic top-down ● Extracting and Comparing LTAG ● Data ● Extraction ● Language comparison using LTAGs ● Conclusion

  4. Introduction: Lexicalized TAG like regular Tree Adjoining Grammar ● initial trees (α-trees) / auxiliary trees (β-trees) ● ● substitution (↓) / adjunction (*) of trees ● additional properties ● lexical “anchor” for each tree, i.e., all trees associated with the lexicon ● here also: separation of lexicon and tree families 4 / 36 Parsing with lexicalized TAG

  5. Introduction: Lexicalized TAG Substitution: S S NP NP 0 NP 0 VP VP N D D D N N V V NP NP 1 ↓ the girl the the boy boy saw saw N D the girl 5 / 36 Parsing with lexicalized TAG

  6. Introduction: Lexicalized TAG Adjunction: S S NP NP VP VP N N D D V V NP NP the the boy boy saw saw N N D D the girl the A N N pretty girl A N* pretty 6 / 36 Parsing with lexicalized TAG

  7. Introduction: Lexicalized TAG ● Tree families ● essentially LTAG trees, but abstracted anchor ● e.g., family of verbs taking one object (np 0 Vnp 1 ) S S NP 0 ↓ VP ... NP i ↓ (+wh) S V◊ NP 1 ↓ NP 0 ↓ VP ε i V◊ NP 1 ↓ ● Lexicon: associates verbs with tree families 7 / 36 Parsing with lexicalized TAG

  8. Introduction: Advantages ● TAG provides extended domain of locality ● capture non-local features in a localized fashion ● 'production-like' ● LTAG preserves this feature ● LTAG provides linking to lexical information ● very useful for actual parsing ● limited search space, prevention of recursion [...] 8 / 36 Parsing with lexicalized TAG

  9. Overview ● Introduction ● Lexicalized TAG, Advantages of parsing with LTAG ● Parsing LTAGs ● bottom-up ● top-down ● bottom-up + dynamic top-down ● Extracting and Comparing LTAG ● Data ● Extraction ● Language comparison using LTAGs ● Conclusion

  10. Parsing LTAGs ● General two-step strategy for lexicalized grammars 1. select elementary structures for lexical input items 2. parse sentence wrt. to resulting set of structures ● first step 'filters' the grammar ● may drastically reduce search space ➔ LTAGs are finitely ambiguous! ● may guide top-down parser by using bottom-up information, e.g., item's position in input string ● second step suitable for any parsing algorithm 10 / 36 Parsing with lexicalized TAG

  11. Parsing LTAGs: bottom-up ● CKY-type parser for TAG (Vijay-Shanker and Joshi, 1985) ● data driven ● bottom-up information of first stage has no effect on algorithm itself ● grammar filtering reduces number of nodes in the recognition matrix 11 / 36 Parsing with lexicalized TAG

  12. Parsing LTAGs: top-down ● like push-down automatons for CFG parsing (Lang, 1990) ● indices for sub trees spanning the input ● CFG: 2 indices; (L)TAG: 4 indices for positions left/right of anchor in auxiliary trees X X* i j k l 12 / 36 Parsing with lexicalized TAG

  13. Parsing LTAGs: top-down ● problem for top-down: left-recursion ● A → A B ● infinite search space ● quite frequent phenomenon in TAG ● solved by grammar filtering for LTAG ● parser considers only elementary trees selected by first stage ● can be distinguished by typology and position in input string ➔ each tree only used once ● finite search space even for top-down parser! 13 / 36 Parsing with lexicalized TAG

  14. Parsing: bottom-up + dynamic top-down ● Earley-type TAG parser (Schabes and Joshi, 1988) ● scan / predict / complete ● use bottom-up prediction to guide top-down parsing ● straight forward parsing for LTAGs ● lexicalization simplifies certain steps of the algorithm 14 / 36 Parsing with lexicalized TAG

  15. Parsing: bottom-up + dynamic top-down 1. first pass selects subset of grammar ➔ limits search space 2. each tree is anchored ➔ same state set can not predict that a tree can be substituted and be completed ➔ same state set can not predict an auxiliary tree for left adjunction and right completion 3. information of anchor position can be used to filter top-down prediction / completions for adjunction and substitution 15 / 36 Parsing with lexicalized TAG

  16. Parsing: bottom-up + dynamic top-down the 1 men 2 who 3 hate 4 women 5 that 6 smoke 7 cigarettes 8 are 9 intolerant 10 ● with normal TAG, “men” could be predicted for substitution in “hate/smoke” structure ● would lead to back tracking in later analysis ● lexicalization prevents prediction! ● anchor position does not match the string 16 / 36 Parsing with lexicalized TAG

  17. Overview ● Introduction ● Lexicalized TAG, Advantages of parsing with LTAG ● Parsing LTAGs ● bottom-up ● top-down ● bottom-up + dynamic top-down ● Extracting and Comparing LTAG ● Data ● Extraction ● Language comparison using LTAGs ● Conclusion

  18. Motivation ● Automatic extraction of grammars has motivations in both theoretical linguistics and NLP engineering ● Theoretical motivation ● quantitative testing of Universal Grammar ● explore similarities and differences of languages ● Engineering motivation ● links between structures of different grammars ● valuable for parsing, lexicon development, machine translation ... 18 / 36 Extracting and comparing LTAG

  19. Data ● 3 Languages for comparison ● English, Chinese, Korean ● Germanic, Sino-Tibetan, Altaic ● Different word order ● SVO (En, Ch) vs. SOV (Ko) ● permutable argument NPs (Ko) ● Subject/Object deletion ● freely (Ch, Ko) vs. none (En) ● Inflectional morphology ● rich (Ko) vs. little (En) vs. none (Ch) 19 / 36 Extracting and comparing LTAG

  20. Data ● English Penn Treebank II (Marcus et al., 1993) ● 1,174K words, ~23.85 words/sentence, 94 tags ● Chinese Penn Treebank (Xia et al., 2000) ● 100K words, ~23.81 words/sentences, 92 tags ● Korean Penn Treebank (Han et al., 2001) ● 54K words, ~10.71 words/sentence, 61 tags ● All provide phrase structure annotation ● Use similar annotation scheme 20 / 36 Extracting and comparing LTAG

  21. Data ● Example of English Penn Treebank sentence 21 / 36 Extracting and comparing LTAG

  22. Overview ● Introduction ● Lexicalized TAG, Advantages of parsing with LTAG ● Parsing LTAGs ● bottom-up ● top-down ● bottom-up + dynamic top-down ● Extracting and Comparing LTAG ● Data ● Extraction ● Language comparison using LTAGs ● Conclusion

  23. Extraction ● Tool: LexTract ● recognizes 3 types of initial/auxiliary LTAG trees ● Spine: predicate-argument relations ● Mod: modification rules ● Conj: coordination relations ● each extracted tree should fall into exactly one category 23 / 36 Extracting and comparing LTAG

  24. Extraction ● Spine-trees ● X ⁰ : anchor, head of X m ● tree is formed by ● a spine X m → X m-1 → ... → X ⁰ ● the arguments of X ⁰ 24 / 36 Extracting and comparing LTAG

  25. Extraction ● Mod-trees ● W q : root with two children ● W q* : adjunction node with same label as W q ● X m : modifier of W q* , spine-tree with 25 / 36 Extracting and comparing LTAG

  26. Extraction ● Conj-trees ● root with 3 children ● Conjunct: adjunction node Xm* ● Conjunction ● Conjunct: spine tree X m → ... → X ⁰ 26 / 36 Extracting and comparing LTAG

  27. Extraction “(at) underwriters still draft policies using fountain pens and blotting paper” spine-trees mod-trees conj-tree 27 / 36 Extracting and comparing LTAG

  28. Extraction: Results template etree types word types context-free types rules English 6,926 131,397 49,206 1,524 Chinese 1,140 21,125 10,772 515 Korean 632 13,941 10,035 152 ● Templates: etrees with lexical items removed ● CFG extracted by reading rules off the templates ● small subsets of frequent templates cover majority of tokens ● English: Top 100 (500, 1000, 1500) = 87.1% (96.6%, 98.4%, 99.0%) 28 / 36 Extracting and comparing LTAG

  29. Overview ● Introduction ● Lexicalized TAG, Advantages of parsing with LTAG ● Parsing LTAGs ● bottom-up ● top-down ● bottom-up + dynamic top-down ● Extracting and Comparing LTAG ● Data ● Extraction ● Language comparison using LTAGs ● Conclusion

  30. Language Comparison ● Make LTAGs comparable ● create new shared tagset ● merge original tags into new tags ● replace original treebank tags ● re-run LexTract ● Compare LTAGs for English, Chinese, Korean ● templates ● context-free rules ● sub-templates 30 / 36 Extracting and comparing LTAG

  31. Language Comparison ● new tagsets reduce templates by ~50% ● few shared, high-frequency templates account for large portion of observed data across languages 31 / 36 Extracting and comparing LTAG

Recommend


More recommend