10/17/2012 Predictive Parsers LL(k) Parsing Can we avoid backtracking? Yes, if for a given input symbol and given non- LL(k) terminal, we can choose the alternative appropriately. • L — left to right scan • L — leftmost derivation This is possible if the first terminal of every alternative in a production is unique: • k — k symbols of lookahead A → a B D | b B B B → c | b c e in practice, k = 1 D → d parsing an input “abced” has no backtracking. It is table-driven and efficient. Left factoring to enable predication: A → | change to A’ A → A’ → | For predicative parsers, must eliminate left recursion LL(k) Parser Structure Sample Parse Table … int * + ( ) $ Input Tokens: $ E → TX E → TX E Read head X → X → X → +E X T → int Y T → ( E ) T Top Output Parser Driver Y → Y → Y → Y → * T Y Syntax Stack Parse table Implementation with 2-D parse table: • A row for each non-terminal • A column for all possible terminals and $ (the end of input marker) $ • Every table entry contains at most one production • Required for a grammar to be LL(1) • No backtracking Syntax stack — hold right hand side (RHS) of grammar rules Parse table — M[A,b] — an entry containing rule “ A → … ” or error Fixed action for each (non-terminal, input symbol) combination Parser driver — next action based on (current token, stack top) LL(1) Parsing Algorithm Push RHS in Reverse Order X — symbol at the top of the syntax stack X — symbol at the top of the syntax stack a — current input symbol a — current input symbol Parsing based on (X, a) : if M[X,a] = “ X → B c D ”: If X = a = $, then parser halts with “ success ” If X = a ≠ $, then pop X from stack and advance input head If X ≠ a, then B Case (a): if X T, then c parser halts with “ failed ,” input rejected X D Case (b): if X N, M[X,a] = “ X → RHS ” … … pop X and push RHS to stack in reverse order $ $ 1
10/17/2012 LL(1) Grammars LL(1) Parsing Remove left recursive and perform left factoring int * int $ Input Tokens: Given the grammar: E → T + E | T Read head T → int * T | int | ( E ) E Top The grammar has no left recursion but requires left factoring. $ After rewriting grammar, we have: E → TX Parse table X → + E | int * + ( ) $ T → int Y | ( E ) E → TX E → TX Y → * T | E X → X → X → +E X T → int Y T → ( E ) T Y → Y → Y → Y → * T Y LL(1) Parsing LL(1) Parsing Input Tokens: int * int $ Input Tokens: int * int $ Read head Read head T Top E Top X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X → X → X → X → X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y → Y → Y → Y → Y → Y → Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: int Top Read head Read head T Top Y X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E E → TX E → TX E E → TX E → TX X → X → X → X → X X → +E X X → +E T T → int Y T → ( E ) T T → int Y T → ( E ) Y → Y → Y → Y → Y → Y → Y Y → * T Y Y → * T 2
10/17/2012 LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: Read head Read head Y Y Top Top X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X → X → X → X → X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y → Y → Y → Y → Y → Y → Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing Input Tokens: int * int $ Input Tokens: int * int $ * Top Read head Read head T T Top X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X → X → X → X → X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y → Y → Y → Y → Y → Y → Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: int Top Read head Read head T Top Y X X $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E E → TX E → TX E E → TX E → TX X → X → X → X → X X → +E X X → +E T T → int Y T → ( E ) T T → int Y T → ( E ) Y → Y → Y → Y → Y → Y → Y Y → * T Y Y → * T 3
10/17/2012 LL(1) Parsing LL(1) Parsing int * int $ int * int $ Input Tokens: Input Tokens: Read head Read head Y Top X X Top $ $ Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X → X → X → X → X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y → Y → Y → Y → Y → Y → Y → * T Y → * T Y Y LL(1) Parsing LL(1) Parsing Input Tokens: int * int $ Input Tokens: int * int $ Read head Read head Accept! $ Top $ Top Parse table Parse table int * + ( ) $ int * + ( ) $ E → TX E → TX E → TX E → TX E E X → X → X → X → X → +E X → +E X X T → int Y T → ( E ) T → int Y T → ( E ) T T Y → Y → Y → Y → Y → Y → Y → * T Y → * T Y Y Action List Constructing the Parse Table We need to know what non-terminals to place our productions in the table? Stack Input Action E $ int * int $ E → TX We know that we have restricted our grammars so that left recursion is eliminated T X $ int * int $ T → int Y and they have been left factored. That means that each production is uniquely int Y X $ int * int $ terminal recognizable by the first terminal that production would derive. Y X $ * int $ Y → * T * T X $ * int $ terminal Thus, we can construct our table from 2 sets: T X $ int $ T → int Y • For each symbol A, the set of terminals that can begin a string derived from A. This set is called the FIRST set of A int Y X $ int $ terminal • For each non-terminal A, the set of terminals that can appear after a Y X $ $ Y → string derived from A is called the FOLLOW set of A X $ $ X → $ $ Halt and accept 4
10/17/2012 First( ) Follow( ) First( ) = set of terminals that start string of terminals derived from . Follow ( ) = { t | S ⇒ * t } Intuition: if X → A B , then First ( B ) ⊆ Follow ( A ) • Apply following rules until no terminal or can be added ∗ However, B may be i.e., � • ⇒ � 1. If t T, then First ( t ) = { t }. For example First ( + ) = { + }. Apply following rules until no terminal or can be added 1. $ Follow ( S ), where S is the start symbol. 2. If X N and X → exists (nullable), then add to First ( X ). e.g., Follow ( E ) = {$ ... }. For example, First ( Y ) = { *, }. 2. Look at the occurrence of a non-terminal on the right hand side of a 3. If X N and X → Y 1 Y 2 Y 3 … Y m , where Y 1 , Y 2 , Y 3 , ... Y m are non- production which is followed by something terminals, then: If A → B , then First ( ) - { } ⊆ Follow ( B ) for each i from 1 to m if Y 1 … Y i-1 are all nullable (or if i = 1) 3. Look at N on the RHS that is not followed by anything, if ( A → B ) or ( A → B and First ( )), First ( X ) = First ( X ) ∪ First ( Y i ) then Follow ( A ) ⊆ Follow ( B ) Algorithm to Compute FIRST, Example FOLLOW, and nullable Initialize FIRST and FOLLOW to all empty sets, and nullable to all Grammar: false. Symbol First Follow E → T X ( X → + E | ( foreach terminal symbol Z ) T → int Y | ( E ) ) FIRST[Z] ← {Z} Y → * T | + + do foreach production X → Y 1 Y 2 … Y k * * if Y 1 … Y k are all nullable (or if k = 0) Int First Set: Follow Set: int then nullable[X] ← true E → T X $ *, $, ), + Y foreach i from 1 to k, each j from i + 1 to k X → + E E → T X +, $, ) X → X if Y 1 … Y i − 1 are all nullable (or if i = 1) X → + E then FIRST[X] ← FIRST[X] ∪ FIRST[Y i ] T → int Y T → int Y (, int $, ), + T if Y i+1 … Y k are all nullable (or if i = k) T → ( E ) T → ( E ) (, int $, ) E Y → * T Y → * T then FOLLOW[Y i ] ← FOLLOW[Y i ] ∪ FOLLOW[X] Y → if Y i+1 … Y j − 1 are all nullable (or if i + 1 = j ) then FOLLOW[Y i ] ← FOLLOW[Y i ] ∪ FIRST[Y j ] until FIRST, FOLLOW, and nullable did not change in this iteration. Constructing LL(1) Parse Table Constructing LL(1) Parse Table To construct the parse table, we check each A → For each terminal a First ( ), add A → to M[A, ]. Symbol First Follow ( ( For each terminal a First ( ), add A → to M[A, ]. ) • ) Grammar: + + E → T X * * • If First ( ), then for each terminal b Follow (A), X → + E int int • add A → to M[A, ]. X → *, Y $, ), + T → int Y +, $, ) X If First ( ) and $ Follow (A), then add A → to M[A, $]. • T → ( E ) (, int $, ), + T Y → * T E (, int $, ) Y → int * + ( ) $ E → T X E → T X E X → + E X T → int Y T → ( E ) T Y → * T Y 5
Recommend
More recommend