Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (1) A Tree Adjoining Grammars (TAG) (Joshi & Schabes 1997) is a tree-rewriting system, i.e., a set of elementary trees with two operations: Parsing beyond context-free grammar: • adjunction: replacing an internal node with a new tree. The new tree is an auxiliary tree and has a special leaf, the Tree Adjoining Grammar Parsing foot node. Laura Kallmeyer, Wolfgang Maier • substitution: replacing a leaf with a new tree. University of T¨ ubingen The new tree is an initial tree ESSLLI Course 2008 Notation: γ [ p, γ ′ ] is the tree one obtains from replacing the node at position p in γ with the tree γ ′ (by substitution or adjunction). Parsing beyond CFG 1 TAG Parsing Parsing beyond CFG 3 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (2) (1) John sometimes laughs Overview S 1. Tree Adjoining Grammars NP VP VP 2. An Earley parser for TAG NP ADV VP ∗ V (a) Introduction John sometimes laughs (b) Items (c) Inference Rules S 3. LR Parsing NP VP (a) Introduction derived tree John ADV VP (b) Construction of the automaton laugh [1 , john ][2 , sometimes ]: (c) The recognizer sometimes V laughs Parsing beyond CFG 2 TAG Parsing Parsing beyond CFG 4 TAG Parsing
Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (3) Tree Adjoining Grammars (5) A Tree Adjoining Grammar (TAG) is a quadruple G = � N, T, I, A � Languages TAG can generate: such that • { ww | w ∈ { a, b } ∗ } • T and N are disjoint alphabets of terminals and nonterminals, • L 4 := { a n b n c n d n | n ≥ 0 } • I is a finite set of initial trees, and Languages TAG cannot generate: • { w n | w ∈ { a, b } ∗ } for any n > 2. • A is a finite set of auxiliary trees. ⇒ TAG generate only a limited amount of cross-serial The trees in I ∪ A are called elementary trees. dependencies G is lexicalized iff each elementary tree has at least one leaf with a • L k := { a n 1 a n 2 a n 3 . . . a n terminal label. k | n ≥ 0 } for any k > 4. ⇒ TAG can “count up to 4, not further”. TAG allows to specify for each node • L := { a 2 n | n ≥ 0 } . 1. whether adjunction is mandatory and ⇒ TAG cannot generate languages whose word lengths grow 2. which trees can be adjoined. exponentially. Parsing beyond CFG 5 TAG Parsing Parsing beyond CFG 7 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Tree Adjoining Grammars (4) Tree Adjoining Grammars (6) A derivation starts with an initial tree. In a final derived tree, all TAGs are mildly context-sensitive: leaves must have terminal labels: • TAGs are slightly more powerful than CFG, they can describe Let G = � I, A, N, T � be a TAG. Let γ and γ ′ be finite trees. a limited amount of cross-serial dependencies. • γ ⇒ γ ′ in G iff there is a node position p and an instance γ ′ 0 of a • TAGs are polynomially parsable (complexity O ( n 6 )). tree (possibly derived from some) γ 0 ∈ I ∪ A such that • TALs are of constant growth. γ ′ = γ [ p, γ 0 ]. ∗ ⇒ is the reflexive transitive closure of ⇒ . • The tree language of G is L T ( G ) := { γ | there is an α ∈ I such ∗ that α ⇒ γ , all leaves in γ have terminal labels and there are no OA nodes in γ } . Parsing beyond CFG 6 TAG Parsing Parsing beyond CFG 8 TAG Parsing
Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Introduction (1) Earley Parsing: Introduction (3) General idea: Whenever we are • Left-to-right CKY parser (Vijay-Shanker & Joshi, 1985) very slow: O( n 6 ) worst case and best case (just as in CFG version • left above a node, we can predict an adjunction and start the of CKY, to many partial trees not pertinent to the final tree traversal of the adjoined tree; are produced) • left of a foot node, we can move back to the adjunction site and • Behaviour is due to pure bottom-up approach, no predictive traverse the tree below it; information whatsoever is used • right of an adjunction site, we continue the traversal of the • Goal: Earley-style parser! First in Schabes & Joshi (1988). adjoined tree at the right of its foot node; Here, we present the algorithm from Joshi & Schabes (1997). • right above the root of an auxiliary tree, we can move back to We assume a TAG without substitution nodes. the right of the adjunction site. Parsing beyond CFG 9 TAG Parsing Parsing beyond CFG 11 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Introduction (2) Earley Parsing: Items (1) What kind of information do we need in an item characterizing a • Earley Parsing: Left-to-right scanning of the string (using partial parsing result? predictions to restrict hypothesis space) • Traversal of elementary trees, current position marked with a [ α, dot, pos, i, j, k, l, sat ?] dot. The dot can have exactly four positions with respect to the where node: left above (la), left below (lb), right above (ra), right • α ∈ I ∪ A is a (dotted) tree, dot and pos the address and below (rb). location of the dot • i, j, k, l are indices on the input string, where i, l ∈ { 0 , . . ., n } , j, k ∈ { 0 , . . ., n } ∪ {−} , n = | w | , − means unbound value • sat ? is a flag. It controls (prevents) multiple adjunctions at a single node ( sat ? = 1 means that something has already been adjoined to the dotted node) Parsing beyond CFG 10 TAG Parsing Parsing beyond CFG 12 TAG Parsing
Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Items (2) Earley Parsing: Inference Rules (1) What do the items mean? [ α, dot, la, i, j, k, l, nil ] ScanTerm α ( dot ) labelled w l +1 • [ α, dot, la, i, j, k, l, nil ]: In α part left of the dot ranges from i to [ α, dot, ra, i, j, k, l + 1 , nil ] l . If α is an auxiliary tree, part below foot node ranges from j to k . • [ α, dot, lb, i, − , − , i, nil ]: In α part below dotted node starts at position i . • w l +1 wi +1 . . . wl • [ α, dot, rb, i, j, k, l, sat ?]: In α part below dotted node ranges from i to l . If α is an auxiliary tree, part below foot node ranges from j to k . If sat ? = nil , nothing was adjoined to [ α, dot, la, i, j, k, l, nil ] dotted node, sat ? = 1 means that adjunction took place. Scan- ǫ α ( dot ) labelled ǫ [ α, dot, ra, i, j, k, l, nil ] • [ α, dot, ra, i, j, k, l, nil ]: In α part left and below dotted node ranges from i to l . If α is an auxiliary tree, part below foot node ranges from j to k . Parsing beyond CFG 13 TAG Parsing Parsing beyond CFG 15 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Items (3) Earley Parsing: Inference Rules (2) Some notational conventions: [ α, dot, la, i, j, k, l, nil ] PredictAdjoinable β ∈ Adj ( α ( dot )) • We use Gorn addresses for the nodes: 0 is the address of the [ β, 0 , la, l, − , − , l, nil ] root, i (1 ≤ i ) is the address of the i th daughter of the root, • and for p � = 0, p · i is the address of the i th daughter of the A • A node at address p . ⇒ • For a tree α and a Gorn address dot , α ( dot ) denotes the node at address dot in α (if defined). A ∗ wi +1 . . . wl • For a node n , Adj ( n ) is the set of trees adjoinable at n . nil ∈ Adj ( n ) signifies that adjunction is not obligatory. Adj ( n ) = ∅ if n has a terminal or ǫ as label. [ α, dot, la, i, j, k, l, nil ] nil ∈ Adj ( α ( dot )) PredictNoAdj [ α, dot, lb, l, − , − , l, nil ] Parsing beyond CFG 14 TAG Parsing Parsing beyond CFG 16 TAG Parsing
Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Inference Rules (3) Earley Parsing: Inference Rules (5) PredictAdjoined Complete II [ β, dot, lb, l, − , − , l, nil ] [ α, dot, rb, i, j, k, l, sat ?] , [ α, dot, la, h, − , − , i, nil ] dot = foot ( β ) , β ∈ Adj ( α ( dot ′ )) β ( dot ) ∈ N [ α, dot ′ , lb, l, − , − , l, nil ] [ α, dot, ra, h, j, k, l, nil ] or A [ α, dot, rb, i, − , − , l, sat ?] , [ α, dot, la, h, j, k, i, nil ] β ( dot ) ∈ N ⇒ • A [ α, dot, ra, h, j, k, l, nil ] • A ∗ A • • A ⇒ • A wh +1 . . . wl wi +1 . . . wl wh +1 . . . wi Parsing beyond CFG 17 TAG Parsing Parsing beyond CFG 19 TAG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Earley Parsing: Inference Rules (4) Earley Parsing: Inference Rules (6) Complete I Adjoin [ α, dot, rb, i, j, k, l, 1] , [ β, dot ′ , lb, i, − , − , i, nil ] [ β, 0 , ra, i, j, k, l, nil ] , [ α, dot, rb, j, p, q, k, nil ] dot ′ = foot ( β ) , β ∈ Adj ( α ( dot )) [ β, dot ′ , rb, i, i, l, l, nil ] β ∈ Adj ( α ( dot )) [ α, dot, rb, i, p, q, l, 1] A • A A A ⇒ • A adj • A ∗ ⇒ A • A ∗ • A ∗ • wi +1 . . . wj wk +1 . . . wl wi +1 . . . wl wi +1 . . . wl wj +1 . . . wk sat ? = 1 prevents the new item from being reused in another Adjoin application. Parsing beyond CFG 18 TAG Parsing Parsing beyond CFG 20 TAG Parsing
Recommend
More recommend