outline lr parsing review of bottom up parsing lalr
play

Outline LR Parsing Review of bottom-up parsing LALR Parser - PowerPoint PPT Presentation

Outline LR Parsing Review of bottom-up parsing LALR Parser Generators Computing the parsing DFA Using parser generators 2 Bottom-up Parsing (Review) The Shift and Reduce Actions (Review) A bottom-up parser rewrites the


  1. Outline LR Parsing • Review of bottom-up parsing LALR Parser Generators • Computing the parsing DFA • Using parser generators 2 Bottom-up Parsing (Review) The Shift and Reduce Actions (Review) • A bottom-up parser rewrites the input string • Recall the CFG: E → int | E + (E) to the start symbol • A bottom-up parser uses two kinds of actions: • The state of the parser is described as • Shift pushes a terminal from input on the α I γ stack – is a stack of terminals and non-terminals α E + ( I int ) ⇒ E + ( int I ) – is the string of terminals not yet examined γ • Reduce pops 0 or more symbols off of the • Initially: I x 1 x 2 . . . x n stack (production RHS) and pushes a non- terminal on the stack (production LHS) E + (E + ( E ) I ) ⇒ E + ( E I ) 3 4

  2. Key Issue: When to Shift or Reduce? LR(1) Parsing: An Example int I int + (int) + (int)$ shift 0 1 • Idea: use a deterministic finite automaton E int I + (int) + (int)$ E → int E → int (DFA) to decide when to shift or reduce on $, + ( E I + (int) + (int)$ shift (x3) + – The input is the stack 2 3 4 E + (int I ) + (int)$ E → int accept – The language consists of terminals and non-terminals E int on $ E + (E I ) + (int)$ shift ) E → int E + (E) I + (int)$ E → E+(E) • We run the DFA on the stack and we examine 7 6 5 on ), + E I + (int)$ shift (x3) the resulting state X and the token tok after I E → E + (E) + int E + (int I )$ E → int on $, + – If X has a transition labeled tok then shift E + (E I )$ shift ( – If X is labeled with “A → β on tok” then reduce 8 9 E + (E) I $ E → E+(E) E + E I $ accept 10 11 E → E + (E) ) on ), + 5 Representing the DFA Representing the DFA: Example • Parsers represent the DFA as a 2D table The table for a fragment of our DFA: – Recall table-driven lexical analysis int + ( ) $ E ( • Lines correspond to DFA states 3 4 … • Columns correspond to terminals and non- int 3 s4 E terminals 4 s5 g6 5 r E r E 5 • Typically columns are split into: 6 → → int int 6 s8 s7 E → int – Those for terminals: the action table ) on ), + 7 r E r E → → – Those for non-terminals: the goto table E+(E) E+(E) … 7 sk is shift and goto state k r X is reduce E → E + (E) → α gk is goto state k on $, + 7 8

  3. The LR Parsing Algorithm The LR Parsing Algorithm • After a shift or reduce action we rerun the let I = w$ be initial input DFA on the entire stack let j = 0 – This is wasteful, since most of the work is repeated let DFA state 0 be the start state let stack = 〈 dummy, 0 〉 • Remember for each stack element on which repeat state it brings the DFA case action[top_state(stack), I[j]] of shift k: push 〈 I[j++], k 〉 reduce X → A: • LR parser maintains a stack pop |A| pairs, sym 1 , state 1 . . . 〈 sym n , state n 〈 〉 〉 push 〈 X, goto[top_state(stack), X] 〉 state k is the final state of the DFA on sym 1 … sym k accept : halt normally error : halt and report error 9 10 Key Issue: How is the DFA Constructed? LR(0) Items • An LR(0) item is a production with a “ I ” • The stack describes the context of the parse somewhere on the RHS – What non-terminal we are looking for – What production RHS we are looking for • The items for T → (E) are – What we have seen so far from the RHS T → I (E) T → ( I E) • Each DFA state describes several such T → (E I ) contexts T → (E) – E.g., when we are looking for non-terminal E, we I might be looking either for an int or an E + (E) RHS • The only item for X → ε is X → I 11 12

  4. LR(0) Items: Intuition LR(1) Items • An item [X → α I β ] says that • An LR(1) item is a pair: – the parser is looking for an X X → α I β , a – it has an α on top of the stack – X → αβ is a production – Expects to find a string derived from β next in the – a is a terminal (the lookahead terminal) input – LR(1) means 1 lookahead terminal • [X → α I β , a] describes a context of the parser • Notes: a , and – We are trying to find an X followed by an – [X → α I a β ] means that a should follow. Then we – We have (at least) α already on top of the stack can shift it and still have a viable prefix Thus we need to see next a prefix derived from β a – – [X →α I ] means that we could reduce X • But this is not always a good idea ! 13 14 Note Convention • The symbol I was used before to separate the • We add to our grammar a fresh new start stack from the rest of input symbol S and a production S → E – I γ , where α is the stack and γ is the remaining – Where E is the old start symbol α string of terminals • In items I is used to mark a prefix of a • The initial parsing context contains: production RHS: S → I E , $ X → α I β , a – Trying to find an S as a string derived from E$ – Here β might contain terminals as well – The stack is empty • In both case the stack is on the left of I 15 16

  5. LR(1) Items (Cont.) LR(1) Items (Cont.) • In context containing • Consider the item E → E + I ( E ) , + E → E + ( I E ) , + – If ( follows then we can perform a shift to context • We expect a string derived from E ) + containing • There are two productions for E E → E + ( I E ) , + E → int and E → E + ( E) • In context containing • We describe this by extending the context E → E + ( E ) I , + with two more items: – We can perform a reduction with E → E + ( E ) E → I int , ) – But only if a + follows E → I E + ( E ) , ) 17 18 The Closure Operation Constructing the Parsing DFA (1) • The operation of extending the context with E → E + ( E ) | int • Construct the start context: items is called the closure operation Closure({ S → I E, $}) S → I E , $ Closure (Items) = E → I E+(E), $ repeat E → I int , $ for each [X → I Y β , a] in Items α E → I E+(E), + for each production Y → γ E → I int , + for each b in First( β a) • We abbreviate as: add [Y → I γ , b] to Items until Items is unchanged S → I E , $ E → I E+(E) , $/+ E → I int , $/+ 19 20

  6. Constructing the Parsing DFA (2) The DFA Transitions • A DFA state is a closed set of LR(1) items • A state “State” that contains [X → α I y β , b] has a transition labeled y to a state that contains the items “ Transition (State, y)” • The start state contains [S → I E , $] – y can be a terminal or a non-terminal • A state that contains [X → α I , b] is labelled Transition (State, y) with “reduce with X → α on b” Items = ∅ for each [X → I y β , b] in State α • And now the transitions … add [X → α y I β , b] to Items return Closure(Items) 21 22 Constructing the Parsing DFA: Example LR Parsing Tables: Notes • Parsing tables (i.e., the DFA) can be 0 1 S → I E , $ E → int E → int I , $/+ constructed automatically for a CFG E → I E+(E), $/+ on $, + int E → I int , $/+ E → E+ I (E), $/+ 3 E • But we still need to understand the + 2 ( construction to work with parser generators S → E I , $ E → E I +(E), $/+ E → E+( I E) , $/+ 4 – E.g., they report errors in terms of sets of items E → I E+(E) , )/+ accept E on $ E → I int , )/+ • What kind of errors can we expect? int E → E+(E I ) , $/+ 5 6 E → E I +(E) , )/+ E → int I , )/+ E → int + ) on ), + and so on… 23 24

  7. Shift/Reduce Conflicts Shift/Reduce Conflicts • If a DFA state contains both • Typically due to ambiguities in the grammar [X → α I a β , b] and [Y → γ I , a] • Classic example: the dangling else S → if E then S | if E then S else S | OTHER • Will have DFA state containing • Then on input “a” we could either [S → if E then S I , else] – Shift into state [X → α a I β , b], or [S → if E then S I else S, x] – Reduce with Y → γ • If else follows then we can shift or reduce • Default (yacc, ML-yacc, etc.) is to shift • This is called a shift-reduce conflict – Default behavior is as needed in this case 25 26 More Shift/Reduce Conflicts More Shift/Reduce Conflicts • Consider the ambiguous grammar • In yacc declare precedence and associativity: E → E + E | E * E | int %left + • We will have the states containing %left * • Precedence of a rule = that of its last terminal [E → E * I E, +] [E → E * E I , +] See yacc manual for ways to override this default [E → I E + E, +] ⇒ E [E → E I + E, +] … … • Resolve shift/reduce conflict with a shift if: • Again we have a shift/reduce on input + – no precedence declared for either rule or terminal – We need to reduce (* binds more tightly than +) – input terminal has higher precedence than the rule – Recall solution: declare the precedence of * and + – the precedences are the same and right associative 27 28

Recommend


More recommend