Overview Introduction Lexicalized TAG, Advantages of parsing with - PowerPoint PPT Presentation

Parsing with Lexicalized TAG (1) Extracting and comparing LTAG (2) Presentation by Philip John Gorinski Seminar “Recent Advances in Parsing Technology” Saarland University, Winter Term 2011/12 (1) Yves Shabes, Aravind K. Joshi, 1990 (2) Fei Xia, Chung-hye Han, Martha Palmer, and Aravind Joshi, 2001

Overview ● Introduction ● Lexicalized TAG, Advantages of parsing with LTAG ● Parsing LTAGs ● bottom-up ● top-down ● bottom-up + dynamic top-down ● Extracting and Comparing LTAG ● Data ● Extraction ● Language comparison using LTAGs ● Conclusion

Introduction: Lexicalized TAG like regular Tree Adjoining Grammar ● initial trees (α-trees) / auxiliary trees (β-trees) ● ● substitution (↓) / adjunction (*) of trees ● additional properties ● lexical “anchor” for each tree, i.e., all trees associated with the lexicon ● here also: separation of lexicon and tree families 4 / 36 Parsing with lexicalized TAG

Introduction: Lexicalized TAG Substitution: S S NP NP 0 NP 0 VP VP N D D D N N V V NP NP 1 ↓ the girl the the boy boy saw saw N D the girl 5 / 36 Parsing with lexicalized TAG

Introduction: Lexicalized TAG Adjunction: S S NP NP VP VP N N D D V V NP NP the the boy boy saw saw N N D D the girl the A N N pretty girl A N* pretty 6 / 36 Parsing with lexicalized TAG

Introduction: Lexicalized TAG ● Tree families ● essentially LTAG trees, but abstracted anchor ● e.g., family of verbs taking one object (np 0 Vnp 1 ) S S NP 0 ↓ VP ... NP i ↓ (+wh) S V◊ NP 1 ↓ NP 0 ↓ VP ε i V◊ NP 1 ↓ ● Lexicon: associates verbs with tree families 7 / 36 Parsing with lexicalized TAG

Introduction: Advantages ● TAG provides extended domain of locality ● capture non-local features in a localized fashion ● 'production-like' ● LTAG preserves this feature ● LTAG provides linking to lexical information ● very useful for actual parsing ● limited search space, prevention of recursion [...] 8 / 36 Parsing with lexicalized TAG

Parsing LTAGs ● General two-step strategy for lexicalized grammars 1. select elementary structures for lexical input items 2. parse sentence wrt. to resulting set of structures ● first step 'filters' the grammar ● may drastically reduce search space ➔ LTAGs are finitely ambiguous! ● may guide top-down parser by using bottom-up information, e.g., item's position in input string ● second step suitable for any parsing algorithm 10 / 36 Parsing with lexicalized TAG

Parsing LTAGs: bottom-up ● CKY-type parser for TAG (Vijay-Shanker and Joshi, 1985) ● data driven ● bottom-up information of first stage has no effect on algorithm itself ● grammar filtering reduces number of nodes in the recognition matrix 11 / 36 Parsing with lexicalized TAG

Parsing LTAGs: top-down ● like push-down automatons for CFG parsing (Lang, 1990) ● indices for sub trees spanning the input ● CFG: 2 indices; (L)TAG: 4 indices for positions left/right of anchor in auxiliary trees X X* i j k l 12 / 36 Parsing with lexicalized TAG

Parsing LTAGs: top-down ● problem for top-down: left-recursion ● A → A B ● infinite search space ● quite frequent phenomenon in TAG ● solved by grammar filtering for LTAG ● parser considers only elementary trees selected by first stage ● can be distinguished by typology and position in input string ➔ each tree only used once ● finite search space even for top-down parser! 13 / 36 Parsing with lexicalized TAG

Parsing: bottom-up + dynamic top-down ● Earley-type TAG parser (Schabes and Joshi, 1988) ● scan / predict / complete ● use bottom-up prediction to guide top-down parsing ● straight forward parsing for LTAGs ● lexicalization simplifies certain steps of the algorithm 14 / 36 Parsing with lexicalized TAG

Parsing: bottom-up + dynamic top-down 1. first pass selects subset of grammar ➔ limits search space 2. each tree is anchored ➔ same state set can not predict that a tree can be substituted and be completed ➔ same state set can not predict an auxiliary tree for left adjunction and right completion 3. information of anchor position can be used to filter top-down prediction / completions for adjunction and substitution 15 / 36 Parsing with lexicalized TAG

Parsing: bottom-up + dynamic top-down the 1 men 2 who 3 hate 4 women 5 that 6 smoke 7 cigarettes 8 are 9 intolerant 10 ● with normal TAG, “men” could be predicted for substitution in “hate/smoke” structure ● would lead to back tracking in later analysis ● lexicalization prevents prediction! ● anchor position does not match the string 16 / 36 Parsing with lexicalized TAG

Motivation ● Automatic extraction of grammars has motivations in both theoretical linguistics and NLP engineering ● Theoretical motivation ● quantitative testing of Universal Grammar ● explore similarities and differences of languages ● Engineering motivation ● links between structures of different grammars ● valuable for parsing, lexicon development, machine translation ... 18 / 36 Extracting and comparing LTAG

Data ● 3 Languages for comparison ● English, Chinese, Korean ● Germanic, Sino-Tibetan, Altaic ● Different word order ● SVO (En, Ch) vs. SOV (Ko) ● permutable argument NPs (Ko) ● Subject/Object deletion ● freely (Ch, Ko) vs. none (En) ● Inflectional morphology ● rich (Ko) vs. little (En) vs. none (Ch) 19 / 36 Extracting and comparing LTAG

Data ● English Penn Treebank II (Marcus et al., 1993) ● 1,174K words, ~23.85 words/sentence, 94 tags ● Chinese Penn Treebank (Xia et al., 2000) ● 100K words, ~23.81 words/sentences, 92 tags ● Korean Penn Treebank (Han et al., 2001) ● 54K words, ~10.71 words/sentence, 61 tags ● All provide phrase structure annotation ● Use similar annotation scheme 20 / 36 Extracting and comparing LTAG

Data ● Example of English Penn Treebank sentence 21 / 36 Extracting and comparing LTAG

Extraction ● Tool: LexTract ● recognizes 3 types of initial/auxiliary LTAG trees ● Spine: predicate-argument relations ● Mod: modification rules ● Conj: coordination relations ● each extracted tree should fall into exactly one category 23 / 36 Extracting and comparing LTAG

Extraction ● Spine-trees ● X ⁰ : anchor, head of X m ● tree is formed by ● a spine X m → X m-1 → ... → X ⁰ ● the arguments of X ⁰ 24 / 36 Extracting and comparing LTAG

Extraction ● Mod-trees ● W q : root with two children ● W q* : adjunction node with same label as W q ● X m : modifier of W q* , spine-tree with 25 / 36 Extracting and comparing LTAG

Extraction ● Conj-trees ● root with 3 children ● Conjunct: adjunction node Xm* ● Conjunction ● Conjunct: spine tree X m → ... → X ⁰ 26 / 36 Extracting and comparing LTAG

Extraction “(at) underwriters still draft policies using fountain pens and blotting paper” spine-trees mod-trees conj-tree 27 / 36 Extracting and comparing LTAG

Extraction: Results template etree types word types context-free types rules English 6,926 131,397 49,206 1,524 Chinese 1,140 21,125 10,772 515 Korean 632 13,941 10,035 152 ● Templates: etrees with lexical items removed ● CFG extracted by reading rules off the templates ● small subsets of frequent templates cover majority of tokens ● English: Top 100 (500, 1000, 1500) = 87.1% (96.6%, 98.4%, 99.0%) 28 / 36 Extracting and comparing LTAG

Language Comparison ● Make LTAGs comparable ● create new shared tagset ● merge original tags into new tags ● replace original treebank tags ● re-run LexTract ● Compare LTAGs for English, Chinese, Korean ● templates ● context-free rules ● sub-templates 30 / 36 Extracting and comparing LTAG

Language Comparison ● new tagsets reduce templates by ~50% ● few shared, high-frequency templates account for large portion of observed data across languages 31 / 36 Extracting and comparing LTAG

Overview Introduction Lexicalized TAG, Advantages of parsing with - PowerPoint PPT Presentation

Parsing with Lexicalized TAG (1) Extracting and comparing LTAG (2) Presentation by Philip John Gorinski Seminar Recent Advances in Parsing Technology Saarland University, Winter Term 2011/12 (1) Yves Shabes, Aravind K. Joshi, 1990 (2)

01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 | KPF Overview 01 |

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 SF park overview OVERVIEW PRESENTATION / 2

OVERVIEW PRESENTATION / 1 OVERVIEW PRESENTATION / 1 Acknowledgements OVERVIEW PRESENTATION / 2 SF

INVESTOR PRESENTATION FEBRUARY 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

INVESTOR PRESENTATION MAY 2019 Index Executive Summary Company Overview Business Overview

INVESTOR PRESENTATION MARCH 2016 INDEX EXECUTIVE SUMMARY COMPANY OVERVIEW BUSINESS OVERVIEW

1 Overview Overview Regional demographic overview Regional demographic overview Workforce

Covid-19 and Business Interruption: Maximizing Insurance Coverage and Federal Grants Counsel

OVERVIEW OVERVIEW OVERVIEW OVERVIEW The qualifications are aimed at primary school

An overview to Maltese An overview to Maltese An overview to Maltese An overview to Maltese

GSM System Overview GSM System Overview GSM System Overview GSM System Overview Phone Lin

Butterball Employees Butterball Employees Butterball Employees Benefits Overview Ruan Benefits

Program-for-Results Financing Overview Overview Overview of World Bank Instruments

INVESTOR PRESENTATION Index Executive Summary Company Overview Business Overview Industry

Key Maths 3 UK Assessm ent overview Claire Parsons Overview 1. Key Maths 3 UK (overview) 2.

Federal Fiscal Year 2017-18 CHASE Fee Program June 21, 2018 Overview CHASE Overview Fee

It Is Finished Christ has accomplished our redemption. The Atonement Atonement means

Active Learning: Rethinking Our Teaching to Promote Deeper Learning Facilitated by Ken Silvestri,

Summarizing A 3 Way Relational Data Stream Baptiste Csernel, 3rd year PhD Student Fabrice

Content Who? Why? Learning Pyramid Millers Pyramid How? Blooms Taxonomy What?

LEARN HOW TO CONTROL EVERY ROOM AT A LUXURY HOTEL REMOTELY: THE DANGERS OF INSECURE HOME

Resolution and logarithmic resolution by weighted blowing up Dan Abramovich, Brown University

Content Everywhere Content Everywhere www.erg.com Or, navigating digital communications without

BLOWING THE COVER: HANDS-ON ANALYSIS OF HANDCRAFTED ANDROID MALWARE Alex Reshetniak | September