predictive parsers ll k parsing
play

Predictive Parsers LL(k) Parsing Can we avoid backtracking? Yes, if - PDF document

10/17/2012 Predictive Parsers LL(k) Parsing Can we avoid backtracking? Yes, if for a given input symbol and given non- LL(k) terminal, we can choose the alternative appropriately. L left to right scan L leftmost derivation


  1. 10/17/2012 Predictive Parsers LL(k) Parsing Can we avoid backtracking? Yes, if for a given input symbol and given non- LL(k) terminal, we can choose the alternative appropriately. • L — left to right scan • L — leftmost derivation This is possible if the first terminal of every alternative in a production is unique: • k — k symbols of lookahead A → a B D | b B B B → c | b c e in practice, k = 1 D → d parsing an input “abced” has no backtracking. It is table-driven and efficient. Left factoring to enable predication: A →  |  change to  A’ A → A’ →  |  For predicative parsers, must eliminate left recursion LL(k) Parser Structure Sample Parse Table … int * + ( ) $ Input Tokens: $ E → TX E → TX E Read head X →  X →  X → +E X T → int Y T → ( E ) T Top Output Parser Driver Y →  Y →  Y →  Y → * T Y Syntax Stack Parse table Implementation with 2-D parse table: • A row for each non-terminal • A column for all possible terminals and $ (the end of input marker) $ • Every table entry contains at most one production • Required for a grammar to be LL(1) • No backtracking Syntax stack — hold right hand side (RHS) of grammar rules Parse table — M[A,b] — an entry containing rule “ A → … ” or error Fixed action for each (non-terminal, input symbol) combination Parser driver — next action based on (current token, stack top) LL(1) Parsing Algorithm Push RHS in Reverse Order X — symbol at the top of the syntax stack X — symbol at the top of the syntax stack a — current input symbol a — current input symbol Parsing based on (X, a) : if M[X,a] = “ X → B c D ”: If X = a = $, then parser halts with “ success ” If X = a ≠ $, then pop X from stack and advance input head If X ≠ a, then B Case (a): if X  T, then c parser halts with “ failed ,” input rejected X D Case (b): if X  N, M[X,a] = “ X → RHS ” … … pop X and push RHS to stack in reverse order $ $ 1

  2. 10/17/2012 LL(1) Grammars LL(1) Parsing Remove left recursive and perform left factoring int * int $ Input Tokens: Given the grammar: E → T + E | T Read head T → int * T | int | ( E ) E Top The grammar has no left recursion but requires left factoring. $ After rewriting grammar, we have: E → TX Parse table X → + E |  int * + ( ) $ T → int Y | ( E ) E → TX E → TX Y → * T |  E X →  X →  X → +E X T → int Y T → ( E ) T Y →  Y →  Y →  Y → * T Y LL(1) Parsing LL(1) Parsing Input Tokens: int * int $ Input Tokens: int * int $ Read head Read head T Top E Top X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X →  X →  X →  X →  X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y →  Y →  Y →  Y →  Y →  Y →  Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: int Top Read head Read head T Top Y X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E E → TX E → TX E E → TX E → TX X →  X →  X →  X →  X X → +E X X → +E T T → int Y T → ( E ) T T → int Y T → ( E ) Y →  Y →  Y →  Y →  Y →  Y →  Y Y → * T Y Y → * T 2

  3. 10/17/2012 LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: Read head Read head Y Y Top Top X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X →  X →  X →  X →  X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y →  Y →  Y →  Y →  Y →  Y →  Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing Input Tokens: int * int $ Input Tokens: int * int $ * Top Read head Read head T T Top X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X →  X →  X →  X →  X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y →  Y →  Y →  Y →  Y →  Y →  Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: int Top Read head Read head T Top Y X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E E → TX E → TX E E → TX E → TX X →  X →  X →  X →  X X → +E X X → +E T T → int Y T → ( E ) T T → int Y T → ( E ) Y →  Y →  Y →  Y →  Y →  Y →  Y Y → * T Y Y → * T 3

  4. 10/17/2012 LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: Read head Read head Y Top X X Top $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X →  X →  X →  X →  X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y →  Y →  Y →  Y →  Y →  Y →  Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing Input Tokens: int * int $ Input Tokens: int * int $ Read head Read head Accept! $ Top $ Top Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X →  X →  X →  X →  X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y →  Y →  Y →  Y →  Y →  Y →  Y → * T Y → * T Y Y Action List Constructing the Parse Table We need to know what non-terminals to place our productions in the table? Stack Input Action E $ int * int $ E → TX We know that we have restricted our grammars so that left recursion is eliminated T X $ int * int $ T → int Y and they have been left factored. That means that each production is uniquely int Y X $ int * int $ terminal recognizable by the first terminal that production would derive. Y X $ * int $ Y → * T * T X $ * int $ terminal Thus, we can construct our table from 2 sets: T X $ int $ T → int Y • For each symbol A, the set of terminals that can begin a string derived from A. This set is called the FIRST set of A int Y X $ int $ terminal • For each non-terminal A, the set of terminals that can appear after a Y X $ $ Y →  string derived from A is called the FOLLOW set of A X $ $ X →  $ $ Halt and accept 4

  5. 10/17/2012 First(  ) Follow(  ) First(  ) = set of terminals that start string of terminals derived from  . Follow (  ) = { t | S ⇒ *  t  } Intuition: if X → A B , then First ( B ) ⊆ Follow ( A ) • Apply following rules until no terminal or  can be added ∗ However, B may be  i.e., � • ⇒ � 1. If t  T, then First ( t ) = { t }. For example First ( + ) = { + }. Apply following rules until no terminal or  can be added 1. $  Follow ( S ), where S is the start symbol. 2. If X  N and X →  exists (nullable), then add  to First ( X ). e.g., Follow ( E ) = {$ ... }. For example, First ( Y ) = { *,  }. 2. Look at the occurrence of a non-terminal on the right hand side of a 3. If X  N and X → Y 1 Y 2 Y 3 … Y m , where Y 1 , Y 2 , Y 3 , ... Y m are non- production which is followed by something terminals, then: If A →  B  , then First (  ) - {  } ⊆ Follow ( B ) for each i from 1 to m if Y 1 … Y i-1 are all nullable (or if i = 1) 3. Look at N on the RHS that is not followed by anything, if ( A →  B ) or ( A →  B  and   First (  )), First ( X ) = First ( X ) ∪ First ( Y i ) then Follow ( A ) ⊆ Follow ( B ) Algorithm to Compute FIRST, Example FOLLOW, and nullable Initialize FIRST and FOLLOW to all empty sets, and nullable to all Grammar: false. Symbol First Follow E → T X ( X → + E |  ( foreach terminal symbol Z ) T → int Y | ( E ) ) FIRST[Z] ← {Z} Y → * T |  + + do foreach production X → Y 1 Y 2 … Y k * * if Y 1 … Y k are all nullable (or if k = 0) Int First Set: Follow Set: int then nullable[X] ← true E → T X $ *,  $, ), + Y foreach i from 1 to k, each j from i + 1 to k X → + E E → T X +,  $, ) X →  X if Y 1 … Y i − 1 are all nullable (or if i = 1) X → + E then FIRST[X] ← FIRST[X] ∪ FIRST[Y i ] T → int Y T → int Y (, int $, ), + T if Y i+1 … Y k are all nullable (or if i = k) T → ( E ) T → ( E ) (, int $, ) E Y → * T Y → * T then FOLLOW[Y i ] ← FOLLOW[Y i ] ∪ FOLLOW[X] Y →  if Y i+1 … Y j − 1 are all nullable (or if i + 1 = j ) then FOLLOW[Y i ] ← FOLLOW[Y i ] ∪ FIRST[Y j ] until FIRST, FOLLOW, and nullable did not change in this iteration. Constructing LL(1) Parse Table Constructing LL(1) Parse Table To construct the parse table, we check each A →  For each terminal a  First (  ), add A →  to M[A,  ]. Symbol First Follow ( ( For each terminal a  First (  ), add A →  to M[A,  ]. ) • ) Grammar: + + E → T X * * • If   First (  ), then for each terminal b  Follow (A), X → + E int int • add A →  to M[A,  ]. X →  *,  Y $, ), + T → int Y +,  $, ) X If   First (  ) and $  Follow (A), then add A →  to M[A, $]. • T → ( E ) (, int $, ), + T Y → * T E (, int $, ) Y →  int * + ( ) $ E → T X E → T X E X → + E X T → int Y T → ( E ) T Y → * T Y 5

Recommend


More recommend