Some sample languages Pumping Lemma for TAL Closure Properties Grammar formalisms Tree Adjoining Grammar: Formal Properties, Part I Parsing Formal Properties of TAG Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universit¨ at T¨ ubingen 16.05.2007 und 21.05.2007 TAG Parsing 1 TAG Parsing 2 Some sample languages Some sample languages Pumping Lemma for TAL Pumping Lemma for TAL Closure Properties Closure Properties Outline: Formal Properties of TAG Some sample languages (1) Central question: How does the class of Tree Adjoining Languages (TAL) look like? Some languages that are in TAL \ CFL : Some sample languages 1 The copy language { ww | w ∈ { a , b } ∗ } Pumping Lemma for TAL 2 The counting languages for 3 and 4: { a 1 k a 2 k a 3 k | k ≥ 0 } Closure Properties 3 { a 1 k a 2 k a 3 k a 4 k | k ≥ 0 } TAG Parsing 3 TAG Parsing 4
Some sample languages Some sample languages Pumping Lemma for TAL Pumping Lemma for TAL Closure Properties Closure Properties Some sample languages (2) Some sample languages (3) Some languages that are not in TAL : The double copy language ⇒ TAG extend CFG but only in a very limited way. { www | w ∈ { a , b } ∗ } In order to situate a class of languages with respect to other classes, one needs to know something about the properties of this In general, any copy language with more than one copy class. Particularly useful: following the first w is not in TAL . Pumping Lemmas The counting languages for n > 4: Closure Properties { a 1 k a 2 k . . . a nk | k ≥ 0 } Languages of exponential growth: { a 2 k | k ≥ 0 } TAG Parsing 5 TAG Parsing 6 Some sample languages Some sample languages Pumping Lemma for TAL Pumping Lemma for TAL Closure Properties Closure Properties Pumping Lemma for TAL (1) Pumping Lemma for TAL (2) The reason why this is so is the following: For CFL, the following pumping lemma holds: In the context-free tree, from a certain tree height on, there is Let L be a context-free language. Then there is a constant c such always a path with two occurences of the same non-terminal. that for all w ∈ L with | w | ≥ c : w = xv 1 yv 2 z with Then the part between the two occurrences can be iterated. | v 1 v 2 | ≥ 1, This means that the strings to left and the right of this part are pumped. | v 1 yv 2 | ≤ c , and How about TAL? for all i ≥ 0: xv 1 i yv 2 i z ∈ L . The TAG derivation trees are context-free. Therefore, the same iteration is possible here. TAG Parsing 7 TAG Parsing 8
Some sample languages Some sample languages Pumping Lemma for TAL Pumping Lemma for TAL Closure Properties Closure Properties Pumping Lemma for TAL (3) Pumping Lemma for TAL (4) Looking at what this means for the strings, one can show the following: Iteration in TAG derivation trees: Pumping Lemma for TAL: If L is a TAL, then there is a constant c such that if w ∈ L and | w | ≥ c , then there are x , y , z , v 1 , v 2 , w 1 , w 2 , w 3 , w 4 ∈ T ∗ such β that β | v 1 v 2 w 1 w 2 w 3 w 4 | ≤ c , β ❀ | w 1 w 2 w 3 w 4 | ≥ 1, β x = xv 1 yv 2 z , and β xw 1 n v 1 w 2 n yw 3 n v 2 w 4 n z ∈ L ( G ) for all n ≥ 0. Vijayashanker (1987) even claims that a stronger version of this lemma holds, but in his proof, one step is not clear. Therefore we use this weak form. TAG Parsing 9 TAG Parsing 10 Some sample languages Some sample languages Pumping Lemma for TAL Pumping Lemma for TAL Closure Properties Closure Properties Pumping Lemma for TAL (5) Pumping Lemma for TAL (6) Pumping lemmas can be used to show that certain languages are not in a certain class. Example: As a corollary of the pumping lemma, one obtains that TAL are of To show: L = { a n b m a n b m a n b m | n , m ≥ 0 } is not a TAL. constant growth (the word length grows in a linear way): A language L has the constant growth property iff there is a Assume that L is a TAL and therefore satisfies the pumping constant c 0 > 0 and a finite set of constants C ⊂ I N \ { 0 } such lemma with a constant c . Consider the word that for all w ∈ L with | w | > c 0 , there is a w ′ ∈ L with w = a c + 1 b c + 1 a c + 1 b c + 1 a c + 1 b c + 1 . | w | = | w ′ | + c for some c ∈ C . None of the w i , 1 ≤ i ≤ 4 from the pumping lemma can contain both a ’s and b ’s. Furthermore, at least three of them must contain the same letters and be inserted into the three different a c + 1 respectively or into the three different b c + 1 . Contradiction since then either | v 1 | ≥ c + 1 or | v 2 | ≥ c + 1. TAG Parsing 11 TAG Parsing 12
Some sample languages Some sample languages Pumping Lemma for TAL Pumping Lemma for TAL Closure Properties Closure Properties Closure Properties (1) Closure Properties (2) It is often useful to reduce a language L to a simpler language before showing that it is not in a certain class C . This can be done The argumentation to show that L is not in a class C goes then as with closure properties. follows: TAL are closed under Assume that L is in C . Then (supposing C is closed under union, concatenation, Kleene closure and substitution. operation f ), L ′ = f ( L ) is also in C . If we know that L ′ is not in homomorphisms, intersection with regular languages, and C , this is a contradiction. inverse homomorphisms. Consequently, L is not in C . ⇒ TALs form a substitution closed Full Abstract Family of Languages (AFL) . (Full AFL = closed under intersection with regular languages, homomorphisms, inverse homomorphisms, union, concatenation and Kleene star.) TAG Parsing 13 TAG Parsing 14 Some sample languages Parsing Basics Pumping Lemma for TAL Earley-Style Parsing for TAG Closure Properties Summary Closure Properties (3) Example: To show: the double copy language L = { www | w ∈ { a , b } ∗ } is Part II not in TAL . Assume that L is in TAL . Then (since TAL is closed under TAG Parsing intersection with regular languages), the language L ′ := L ∩ a ∗ b ∗ a ∗ b ∗ a ∗ b ∗ = { a n b m a n b m a n b m | n , m ≥ 0 } is in TAL as well. Contradiction since L ′ does not satisfy the pumping lemma for TAL . Consequently, L is not in TAL . TAG Parsing 15 TAG Parsing 16
Parsing Basics Parsing Basics Parsing and Recognition Earley-Style Parsing for TAG Earley-Style Parsing for TAG Items and Deduction-Based Parsing Summary Summary CYK and Earley Outline: TAG Parsing Recognition and parsing Parsing Basics 4 Parsing and Recognition Items and Deduction-Based Parsing What do we want to use a grammar for? We are interested in CYK and Earley knowing if a certain word/sentence is licenced by the grammar Earley-Style Parsing for TAG 5 ( recognition ) Preliminaries Items and Inference Rules the structure(s) that a grammar assigns to a grammatical From Recognition to Parsing word/sentence ( parsing ) Summary 6 Mild Context-Sensitivity Parsing TAG Parsing 17 TAG Parsing 18 Parsing Basics Parsing and Recognition Parsing Basics Parsing and Recognition Earley-Style Parsing for TAG Items and Deduction-Based Parsing Earley-Style Parsing for TAG Items and Deduction-Based Parsing Summary CYK and Earley Summary CYK and Earley Context-free grammar Tree-adjoining grammar TAG with three trees: CFG G with two rules: S → aSb , S → ab . α S β 1 β 2 S NA S NA Recognize input aabb : yes! ǫ a S b S Parse input aabb : S S NA ∗ a S NA ∗ b Recognize input abab : yes! Parse input abab : and α S NA a S b a S NA β 1 b S β 2 S NA b s a b S NA a ǫ TAG Parsing 19 TAG Parsing 20
Parsing Basics Parsing and Recognition Parsing Basics Parsing and Recognition Earley-Style Parsing for TAG Items and Deduction-Based Parsing Earley-Style Parsing for TAG Items and Deduction-Based Parsing Summary CYK and Earley Summary CYK and Earley Characterization of Parsing (CFG case) How do we do this? We fill a chart with all possible constituents and check if it Given a grammar G , we want to check the grammaticality of a contains the goal tree. For this, the CYK-algorithm for CFG certain input w and find the corresponding structure. (Chomsky Normal Form, CNF) can be used: Very informal description: for each position p 0 Initialize : Start with trees related to terminal symbols C [ p 0 , p 0 + 1] := { A ∈ N | A → w p 0 + 1 ∈ P } ( bottom-up ) or related to the root symbol ( top-down ) for each position p 0 for each position p 1 Parse : Successively combine trees to bigger trees according to for each position p 2 rewriting rules C [ p 0 , p 2 ] := C [ p 0 , p 2 ] ∪ { A ∈ N | A → BC ∈ P Goal : Stop when we have a tree with root node labeled with ∧ B ∈ C [ p 0 , p 1 ] ∧ C ∈ C [ p 1 , p 2 ] } goal label (“ S ”) and yield exactly w return true if S ∈ C [0 , n ] . The algorithm proceeds bottom-up . TAG Parsing 21 TAG Parsing 22 Parsing Basics Parsing and Recognition Parsing Basics Parsing and Recognition Earley-Style Parsing for TAG Items and Deduction-Based Parsing Earley-Style Parsing for TAG Items and Deduction-Based Parsing Summary CYK and Earley Summary CYK and Earley CKY example Towards parsing schemata (1) 0 1 2 3 4 5 6 7 S VP NP PP NP N Problem: The parsing strategy (i.e. the strategy of getting the 6 Det final parse tree) is hidden in a bunch of control structures 5 P (loops, chart) 4 S VP NP N These are implementation details the parsing strategy does 3 Det not depend on. 2 V Better: Parsing schemata ! 1 NP I saw a man with a telescope TAG Parsing 23 TAG Parsing 24
Recommend
More recommend