csci 2320 syntactic analysis ch 3
play

CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) - PDF document

10/3/17 CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) Mohammad T . Irfan AKA "parser" Stream of Parse tree/ Parser tokens syntax error Question: What is the grammar? 1 10/3/17 Parsing algorithms u Predictive


  1. 10/3/17 CSCI-2320 Syntactic Analysis (Ch 3 & Wikipedia for CYK) Mohammad T . Irfan AKA "parser" Stream of Parse tree/ Parser tokens syntax error Question: What is the grammar? 1

  2. 10/3/17 Parsing algorithms u Predictive parser (as opposed to backtracking) u Recursive descent (RD) parser: each nonterminal is a function that recognizes input derivable from that nonterminal u Top-down u LL(1): left to right scan, left-most derivation, and 1 token look-ahead RD parser for assignment stmt Assignment à Id = Expr; Expr à Term {AddOp Term} AddOp à + | - Term à Factor {MulOp Factor} MulOp à * | / Factor à [UnaryOp] Primary UnaryOp à - Primary à Id | IntLiteral | FloatLiteral | (Expr) ... <Lexical syntax for Id, IntLiteral, FloatLiteral> ... 2

  3. 10/3/17 Python code for smaller version Expr à Term {(+|-) Term} Term à Factor {(*|/) Factor} Factor à IntLiteral ... <Lexical syntax for IntLiteral> ... u Code is available on Blackboard under Assignment 2 u parser_v1.py: Only check for syntactic correctness (expression evaluation later when we do semantics) Requirements for RD parser 1. Remove left recursions (why?) 2. Do "left factoring" 3

  4. 
 
 10/3/17 Removing left recursion u Example + u Algorithm (assume no cycle; i.e., no A => A) Nonterminals: A 1 , A 2 , ..., A n (ordered arbitrarily) For each i For each j < i No left recursion here Let A j à δ 1 | δ 2 | ... | δ k Replace each A i à A j γ by 
 A i à δ 1 γ | δ 2 γ | ... | δ k γ Eliminate left recursion from all A i products Left factoring u IfStmt à if Expr then Stmt u IfStmt à if Expr then Stmt else Stmt u Why can't RD parser deal with it? u Solution u Find the largest prefix α and factor it out A à αβ 1 | αβ 2 
 A à α A' 
 A' à β 1 | β 2 4

  5. 10/3/17 Literature review u NP-hard: Given a CFG, is there an LL(1) parser? u Impossibility example: LG = {a n 0 b n | n >= 1} U {a n 1 b 2n | n >= 1} u Why is an LL(1) impossible? Literature review u Is there always a parser (not necessarily LL(1)) for any CFG? u CYK algorithm: Cocke & Younger (1967) and Kasami (1965) u First parser for any CFG u Bottom-up parser u Frost (2007): First top-down parser for any CFG; improved by Ridge (2014) 5

  6. 10/3/17 CYK Parsing Algorithm https://en.wikipedia.org/wiki/CYK_algorithm What it does u Given (1) a CFG and (2) a string, verifies whether the string can be derived by this grammar u Example u Detects syntactic errors in a given C program 6

  7. 10/3/17 Requirements u CFG must be in Chomsky Normal Form (CNF) A à BC A à a u No ε in any product u OK to have left recursion! u Left factoring is out of question (why?) Idea u Bottom-up approach + dynamic programming u Start with individual symbols of input string u Combine multiple symbols together u 2 symbols u 3 symbols u ... u Climb up the grammar hierarchy u Yes answer to parsing we can get to the start symbol 7

  8. 10/3/17 CYK example u Input CFG Expr à Expr + Term | Expr – Term | Term Term à Term*Factor | Term/Factor | Factor Factor à 0 | 1 | ... | 9 u CNF Expr à Expr X X à AddOp Term AddOp à + | – Expr à Term Y #Avoid bypassing Expr à Term à ... Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9 CYK example (cont...) u Input string: 2 – 3 * 4 Expr à Expr X X à AddOp Term AddOp à + | – Expr à Term Y 5 Expr Term à Term Y Y à MultOp Factor 4 MultOp à * | / X Length Expr Term, Factor à 0 | 1 | ... | 9 3 Expr Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9 X 2 Y Expr, Expr, Expr, 1 Term, AddOp Term, MultOp Term, Factor Factor Factor 2 – 3 * 4 Start index 1 2 3 4 5 j 8

  9. 
 10/3/17 CYK Algorithm Inputs: CNF grammar and n tokens Fill in the row for length 1 For each length i from 2 to n: For each index j from 1 to n-i+1: A à BC? 
 For k = length of B from 1 to i-1: If there's a product A à BC s.t. 
 B is in cell (j,k) and 
 C is in cell (j+k, i-k): Add A to cell (j,i) Return True iff cell (1,n) contains 
 the start symbol. Negative example u Input string: Expr à Expr X X à AddOp Term 2 + 3 * / AddOp à + | – Term à Term Y Y à MultOp Factor MultOp à * | / Factor à 0 | 1 | ... | 9 Term à 0 | 1 | ... | 9 Expr à 0 | 1 | ... | 9 9

  10. 10/3/17 Class Participation 4 u CNF grammar S à AX | AB X à SB A à 0 B à 1 u Parse the following strings using the CYK alg u 0011 ✔ u 01010 ✗ u Collaboration level: 0 (work freely in groups) 10

Recommend


More recommend