basic parsing algorithms chart parsing
play

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in - PowerPoint PPT Presentation

Basic Parsing Algorithms Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt Talk Outline Chart Parsing Basics Chart Parsing Algorithms Earley Algorithm CKY Algorithm Basics


  1. Basic Parsing Algorithms – Chart Parsing Seminar Recent Advances in Parsing Technology WS 2011/2012 Anna Schmidt

  2. Talk Outline  Chart Parsing – Basics  Chart Parsing – Algorithms – Earley Algorithm – CKY Algorithm → Basics → BitPar: Efficient Implementation of CKY

  3. Chart Parsing – Basics

  4. Chart Parsing – Basics  First proposed by Martin Kay  Dynamic programming approach – Partial results of the computation are stored and (re)used later if needed → Same problem is not solved more than once  Operates on a CFG  Functionality: Recogniser / Parser … in this talk focus on recogniser functionality

  5. Main Components  Chart  Edges  Agenda

  6. Component: Chart  Is a well-formed substring table (WFST) – Stores partial and complete analyses of substrings – Information stored in one triangular half of a two-dimensional array of (n+1)*(n+1) | n*n  Can also be understood as a (directed) graph – Vertices: positions between input words 0 Mary 1 feeds 2 the 3 otter 4 – Edges connecting vertices  Allows no duplicate entries

  7. Component: Edge  Data structure storing information about a particular step in the parsing process  Inhabit cells of the chart  Contain – Start and end position in input string – A dotted rule – Can also contain edge probability

  8. Component: Edge  A dotted rule consists of – Left hand side (LHS) = non-terminal symbol – Right hand side (RHS) = non-terminal or terminal symbol – A dot between RHS symbols indicating which constituents have already been found  Edges can be – Active / incomplete: dot not the last element of RHS – Inactive / complete: dot is last element of RHS  Example: S → NP • VP (0,1)

  9. Component: Agenda  Organises the order in which tasks are executed  Here all tasks (edges) are collected before being put on the chart  Ordering of agenda determines what is processed first → Therefore also which parse is found first – Queue, stack, ordering with respect to probabilities, …

  10. Parsing Strategies  Kay differentiates parsing strategies along two dimensions: – Bottom-up versus top-down – Directed versus undirected  Directed bottom-up – Only build edges for phrases that can actually be incorporated into a higher level structure → Left-Corner Parser  Directed top-down – Only build a new (active) edge if the next word of the input can be used to extend such an edge → Earley  Undirected varieties: No such restrictions → Undirected Bottom-Up: CKY

  11. Parsing Strategies Ways of achieving directedness:  Reachability Table: – Contains for each non-terminal N the set of all symbols that can be the first element of a string dominated by N – For example: NP can start with DET, N, ADJ, but not with V  Rule selection table: – M*N table where M = non-terminals excluding pre-terminals N = all non-terminals – Contains all grammar rules applicable in a situation where M is the 'upper' and N is the 'lower' symbol

  12. Chart Parsing: Advantages  No repeated computation of same subproblem  Deals well with left-recursive grammars  Deals well with ambiguity  No backtracking necessary

  13. Earley Algorithm

  14. Earley Algorithm  Proposed by Jay Earley  Top down search  Can handle all CFGs  Efficient: – O(n3) in the general case – Faster for particular types of grammar

  15. Terminology  In his paper, Earley does not use the notion of a 'chart'  He represents the parsing process as sets of states – Index of each state set = end position of all states in the set – A state largely corresponds to an edge - Contains dotted rule - Pointer to start position - End position can be derived from state set

  16. Terminology  Formalisms are very similar  Examples easier to follow when represented in charts  So we will stick with 'chart' representations

  17. Algorithm – Components  Initialization  Predictor  Scanner  Completer  Algorithm operates on one half of an array of size (n+1)*(n+1)

  18. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Initialise 0 1 2 3 4 5 0 X → • S eos 1 2 3 4 5

  19. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos S → • NP VP NP → • N NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5

  20. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → • N NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5

  21. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 2 3 4 5

  22. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → • feeds 2 3 4 5

  23. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds 2 3 4 5

  24. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 3 4 5

  25. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N NP → • DET N N → • Mary N → • otter DET → • the 3 4 5

  26. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N N → • Mary N → • otter DET → • the 3 4 5

  27. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 4 5

  28. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Predict 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → • otter 4 5

  29. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Scan 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5

  30. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → DET N • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5

  31. 0 Mary 1 feeds 2 the 3 otter 4 eos 5 Complete 0 1 2 3 4 5 0 X → • S eos N → Mary • S → • NP VP NP → N • NP → • N S → NP • VP NP → • DET N N → • Mary N → • otter DET → • the 1 VP → • V NP V → feeds • VP → V NP • V → • feeds VP → V • NP 2 NP → • N DET → the • NP → DET N • NP → • DET N NP → DET • N N → • Mary N → • otter DET → • the 3 N → • Mary N → otter • N → • otter 4 5

Recommend


More recommend