Basic Parsing Algorithms – Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt
Talk Outline Chart Parsing – Basics Chart Parsing – Algorithms – Earley Algorithm – CKY Algorithm → Basics → BitPar: Efficient Implementation of CKY
Chart Parsing – Basics
Chart Parsing – Basics First proposed by Martin Kay Dynamic programming approach – Partial results of the computation are stored and (re)used later if needed → Same problem is not solved more than once Operates on a CFG Functionality: Recogniser / Parser … in this talk focus on recogniser functionality
Main Components Chart Edges Agenda
Component: Chart Is a well-formed substring table (WFST) – Stores partial and complete analyses of substrings – Information stored in one triangular half of a two-dimensional array of (n+1)*(n+1) | n*n Can also be understood as a (directed) graph – Vertices: positions between input words 0 Mary 1 feeds 2 the 3 otter 4 – Edges connecting vertices Allows no duplicate entries
Component: Edge Data structure storing information about a particular step in the parsing process Inhabit cells of the chart Contain – Start and end position in input string – A dotted rule – Can also contain edge probability
Component: Edge A dotted rule consists of – Left hand side (LHS) = non-terminal symbol – Right hand side (RHS) = non-terminal or terminal symbol – A dot between RHS symbols indicating which constituents have already been found Edges can be – Active / incomplete: dot not the last element of RHS – Inactive / complete: dot is last element of RHS Example: S → NP • VP (0,1)
Component: Agenda Organises the order in which tasks are executed Here all tasks (edges) are collected before being put on the chart Ordering of agenda determines what is processed first → Therefore also which parse is found first – Queue, stack, ordering with respect to probabilities, …
Parsing Strategies Kay differentiates parsing strategies along two dimensions: – Bottom-up versus top-down – Directed versus undirected Directed bottom-up – Only build edges for phrases that can actually be incorporated into a higher level structure → Left-Corner Parser Directed top-down – Only build a new (active) edge if the next word of the input can be used to extend such an edge → Earley Undirected varieties: No such restrictions → Undirected Bottom-Up: CKY
Parsing Strategies Ways of achieving directedness: Reachability Table: – Contains for each non-terminal N the set of all symbols that can be the first element of a string dominated by N – For example: NP can start with DET, N, ADJ, but not with V Rule selection table: – M*N table where M = non-terminals excluding pre-terminals N = all non-terminals – Contains all grammar rules applicable in a situation where M is the 'upper' and N is the 'lower' symbol
Chart Parsing: Advantages No repeated computation of same subproblem Deals well with left-recursive grammars Deals well with ambiguity No backtracking necessary
Earley Algorithm
Earley Algorithm Proposed by Jay Earley Top down search Can handle all CFGs Efficient: – O(n3) in the general case – Faster for particular types of grammar
Terminology In his paper, Earley does not use the notion of a 'chart' He represents the parsing process as sets of states – Index of each state set = end position of all states in the set – A state largely corresponds to an edge - Contains dotted rule - Pointer to start position - End position can be derived from state set
Terminology Formalisms are very similar Examples easier to follow when represented in charts So we will stick with 'chart' representations
Algorithm – Components Initialization Predictor Scanner Completer Algorithm operates on one half of an array of size (n+1)*(n+1)
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Initialise 0 1 2 3 4 5 0 X → • S eos 1 2 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos S → • NP VP NP → • N NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → • N NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → • feeds 2 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds 2 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N NP → • DET N N → • Mary N → • otter DET → • the 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N N → • Mary N → • otter DET → • the 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → • otter 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → DET N • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5
0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • VP → V NP • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → DET N • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5
Recommend
More recommend