parsing beyond context free grammar
play

Parsing beyond context-free grammar: necessarily adjacent. Range - PowerPoint PPT Presentation

Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Range Concatenation Grammar The idea behind range concatenation grammar (RCG) is comparable to the idea behind MCFG. Predicate-rewriting clauses describe ranges which are not Parsing


  1. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Range Concatenation Grammar The idea behind range concatenation grammar (RCG) is comparable to the idea behind MCFG. • Predicate-rewriting clauses describe ranges which are not Parsing beyond context-free grammar: necessarily adjacent. Range Concatenation Grammar Parsing • One predicate can be true or false for a certain string. • Some string w is in the language of an RCG if the start Laura Kallmeyer, Wolfgang Maier predicate is true for w . University of T¨ ubingen • While in MCFG, a string is generated, in RCG, a string is ESSLLI Course 2008 reduced to ǫ . Parsing beyond CFG 1 RCG Parsing Parsing beyond CFG 3 RCG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Expressivity of RCG • RCG exactly covers the class of PTIME recognizable languages (Bertsch&Nederhof, 2001). • Simple RCG (basically non-deleting non-copying RCG) is Overview equivalent to MCFG 1. Range Concatenation Grammars (RCG) • RCG can represent languages beyond mild context-sensitivity 2. Parsing RCG (a) Directional top-down parsing (b) Earley-style parsing 3. Uses of RCG Parsing beyond CFG 2 RCG Parsing Parsing beyond CFG 4 RCG Parsing

  2. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Definition of RCGs: Derivation Relation, Language Definition of RCGs: Grammar Definition • The derivation relation is defined as follows: A RCG is a tuple G = � N, T, V, P, S � such that For a predicate A of arity k , a clause A ( . . . ) → . . . , and ranges • N is a finite set of predicates, each with a fixed arity, � i 1 , j 1 � , . . ., � i k , j k � with respect to a given w : if there is an • T and V are disjoint finite sets of terminals and variables, instantiation of this clause with LHS A ( � i 1 , j 1 � , . . ., � i k , j k � ), then A ( � i 1 , j 1 � , . . ., � i i , j k � ) can be replaced with the RHS of • S ∈ N is the start predicate of arity 1, and this instantiation. • P is a finite set of clauses of the form • The language of an RCG G is the set of strings that can be A 0 ( x 01 , . . ., x 0 a 0 ) → ǫ reduced to the empty word: ∗ L ( G ) = { w | S ( � 0 , | w |� ) ⇒ ǫ with respect to w } . or A 0 ( x 01 , . . ., x 0 a 0 ) → A 1 ( x 11 , . . ., x 1 a 1 ) . . .A n ( x n 1 , . . ., x na n ) with n ≥ 1 and A i ∈ N, x ij ∈ ( T ∪ V ) ∗ and a i being the arity of A i . A predicate A n ( x n 1 , . . ., x na n ) can be written as A n ( � x n ) Parsing beyond CFG 5 RCG Parsing Parsing beyond CFG 7 RCG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Definition of RCGs: Instantiation A sample RCG (1) Sample RCG G for the string language { a n b k a n | k, n ∈ IN } : An A given clause C is instantiated with respect to a string w if variables and arguments are consistently replaced by ranges of w . RCG with N = { S, A, B } , T = { a, b } , V = { X, Y, Z } , start predicate S and clauses Example: • S ( X Y Z ) → A ( X, Z ) B ( Y ), • A ( � i . . .j � ) → B ( � i + 1 . . .j � ) • A ( a X, a Y ) → A ( X, Y ), is an instantiation of the clause • B ( b X ) → B ( X ), • A ( aX 1 ) → B ( X 1 ) • A ( ǫ, ǫ ) → ǫ, if w i +1 = a . • B ( ǫ ) → ǫ Parsing beyond CFG 6 RCG Parsing Parsing beyond CFG 8 RCG Parsing

  3. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 A sample RCG (2) A sample RCG (4) As an example consider the reduction of w = aabaa : A ( a X , a Y ) → A ( X , Y ) S ( X Z ) → A ( X , Z ) B ( Y ) Y w 0 , 1 w 1 , 2 w 3 , 4 w 4 , 5 w 1 , 2 w 4 , 5 w 0 , 2 w 2 , 3 w 3 , 5 w 0 , 2 w 3 , 5 w 2 , 3 a a a a a a aa b aa aa aa b leads to A ( w 0 , 2 , w 3 , 5 ) ⇒ A ( w 1 , 2 , w 4 , 5 ). Then With this instantiation, S ( w 0 , 5 ) ⇒ A ( w 0 , 2 , w 3 , 5 ) B ( w 2 , 3 ). Then Parsing beyond CFG 9 RCG Parsing Parsing beyond CFG 11 RCG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 A sample RCG (3) A sample RCG (5) B ( b X ) → B ( X ) A ( a X , a Y ) → A ( X , Y ) w 2 , 3 w 3 , 3 w 3 , 3 w 1 , 2 w 2 , 2 w 4 , 5 w 5 , 5 w 2 , 2 w 5 , 5 b ǫ ǫ a ǫ a ǫ ǫ ǫ and B ( ǫ ) → ǫ and A ( ǫ, ǫ ) → ǫ lead to lead to A ( w 1 , 2 , w 4 , 5 ) ⇒ A ( w 2 , 2 , w 5 , 5 ) ⇒ ǫ A ( w 0 , 2 , w 3 , 5 ) B ( w 2 , 3 ) ⇒ A ( w 0 , 2 , w 3 , 5 ) B ( w 3 , 3 ) ⇒ A ( w 0 , 2 , w 3 , 5 ). Parsing beyond CFG 10 RCG Parsing Parsing beyond CFG 12 RCG Parsing

  4. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 RCG parsing: Treatment of terminals Definition of RCGs: Other properties (1) Without loss of generality, we presuppose that all non- ǫ clauses • An RCG with maximal predicate arity k is called an RCG of contain no terminals in their arguments. arity k (also called a k -RCG). For each t ∈ T , we introduce a new clause T t ( t ) → ǫ and for each • An RCG is called non-combinatorial if each of the arguments clause C ∈ P , in the right-hand sides of the productions are single variables. • we replace each occurrence t ′ of t in all arguments of all • An RCG is called linear if no variable appears more than once predicates with a variable V t ′ , in the left-hand sides of the productions and no variable appears more than once in the right-hand side of the • for each V t ′ , we add the predicate T t ( V t ′ ) to the RHS of C . productions. Furthermore, for all clauses we assume that its variables are continuously numbered from 1 to some j . Parsing beyond CFG 13 RCG Parsing Parsing beyond CFG 15 RCG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Definition of RCGs: Other properties (2) RCG parsing: Range vectors We will use range vectors similar to those used for MCFG parsing. • An RCG is called non-erasing if for each production, each Range vectors are used to describe variable bindings. variable occurring in the left-hand side occurs also in the right-hand side and vice versa. • φ = ( � x 1 , y 1 � , . . ., � x k , y k � ) is a range vector in w if all � x i , y i � are ranges in w for 1 ≤ i ≤ k . • An RCG is called simple if it is non-combinatorial, linear and non-erasing. • φ = ( � x 1 , y 1 � , . . ., � x k , y k � ) is a range constraint vector if it contains pairs � x, y � where x, y ∈ Pos ( w ) ∪ V r ( V r is a set • A simple RCG is called ordered simple if the range variables { r 1 , r 2 , . . . } of range boundary variables) such that if are ordered the same way in the RHS and the LHS predicates. � x, y � ∈ Pos ( w ) 2 then it is a range. Ordered simple RCG is equivalent to simple RCG. • k is called the dimension of φ • φ ( i ) .l denotes then the first component and φ ( i ) .r the second component of the i th element of φ . Parsing beyond CFG 14 RCG Parsing Parsing beyond CFG 16 RCG Parsing

  5. Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 RCG parsing: Variable constraint vectors Directional top-down parsing The variable constraint vector φ of a non- ǫ clause A ( � x ) → Φ is a Corresponds to the algorithm presented in Boullier (2000). range constraint vector of dimension j , j being the highest variable Item form: index in the clause. It contains only x ∈ V r × V r and must be • Active items: [ A ( � X ) → Φ • Ψ , φ ] consistent with variable adjacencies in the clause. • Passive items: [ A, ψ, flag ] Formally, the elements of φ are pairs from V r × V r such that φ ( h ) .r = φ ( i ) .l iff X h X i occurs as a substring in one of the where arguments of the clause. • φ is a range vector of dimension j , j being the highest variable index in the clause, • ψ is a range vector of dimension k , k being the arity of A , • flag = { p, c } indicates if a passive item is predicted or completed. Parsing beyond CFG 17 RCG Parsing Parsing beyond CFG 19 RCG Parsing Kallmeyer/Maier ESSLLI 2008 Kallmeyer/Maier ESSLLI 2008 Update of range vectors Directional top-down parsing (axiom and goal) We define an update φ ′ of a range constraint vector φ with respect • Axiom: to an identity x = y , x, y ∈ Pos ( w ) ∪ V r as follows: [ S, ( � 0 , n � ) , p ] • if x = y , then φ ′ = φ ; • The goal item is [ S, ( � 0 , n � ) , c ]. • else if x ∈ V r and the result ψ of replacing all occurrences of x in φ with y is a range constraint vector, then φ ′ = ψ ; • else if y ∈ V r and the result ψ of replacing all occurrences of y in φ with x is a range constraint vector, then φ ′ = ψ ; • otherwise, φ ′ is undefined. Parsing beyond CFG 18 RCG Parsing Parsing beyond CFG 20 RCG Parsing

Recommend


More recommend