compiler construction
play

Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 - PowerPoint PPT Presentation

Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel Includes material by Jan Christian Meyer and Rich Maclin (UNM) Overview Top-down parsing revisited Bottom-up parsing Comparison to top-down parsing


  1. Compiler Construction Lecture 7: Bottom-up parsing 2020-01-28 Michael Engel Includes material by Jan Christian Meyer and Rich Maclin (UNM)

  2. Overview • Top-down parsing revisited • Bottom-up parsing • Comparison to top-down parsing • Shift-reduce parsers • Conflict resolution Compiler Construction 07: Bottom-up parsing � 2

  3. Types of languages and automata Syntax analysis • Context-free languages are a superset of regular languages • Regular languages can be detected by DFAs/NFAs Stack machines • DFAs and NFAs don’t have a memory • Stack machines (also called pushdown automata ) 
 add memory by introducing operations 
 push and pop recursively enumerable 
 (type 0) • They enable the stack 
 machine to memorize 
 context-sensitive 
 (type 1) (trace) the path they 
 context-free took to get to a state 
 (type 2) (and revert to a 
 previous one) regular languages (type 3) • More powerful than D/NFA Finite automata Compiler Construction 07: Bottom-up parsing � 3

  4. Top-down parsing and the stack Syntax analysis • We’ve seen LL(1) tables and manually 
 built recursive descent parsers x y EOF • Another simple example: A → x B | y C 
 A → x B A → y C A B → x B | ε 
 B → x B B →ε B v o i d pa r se_A () { C → y C | ε C → y C C →ε sw itc h (s y m) { C c ase 'x': add_ tr ee(x,B); 
 v o i d pa r se_B () { v o i d pa r se_ C () { ma tc h(x); 
 sw itc h (s y m): sw itc h (s y m): pa r se_B(); 
 c ase 'x': c ase 'x': b r ea k ; add_ tr ee(x,B); 
 e rr o r (); b r ea k ; c ase ' y ': add_ tr ee( y , C ); 
 ma tc h(x); 
 c ase ' y ': ma tc h( y ); 
 pa r se_B(); 
 add_ tr ee( y , C ); 
 pa r se_ C (); 
 b r ea k ; ma tc h( y ); 
 b r ea k ; c ase ' y ': pa r se_ C (); 
 c ase EOF: e rr o r (); b r ea k ; b r ea k ; e rr o r (); 
 c ase EOF: c ase EOF: b r ea k ; r e t u r n; r e t u r n; } r e t u r n; r e t u r n; r e t u r n; } } } Compiler Construction 07: Bottom-up parsing � 4

  5. Tracing the recursive descent code Syntax analysis • Which derivation do we get when parsing "y y y"? 
 A → x B | y C 
 A → y C → yy C → yyy C → yyy B → x B | ε 
 • What is the related hierarchy of function calls ? C → y C | ε Call Recur: match(y) Call Call match(y) parse_C parse_C Return Call match(y) parse_C parse_C parse_C parse_C Call Call Return match(y) parse_C parse_C parse_C parse_C parse_C parse_C parse_C Call Return parse_A parse_A parse_A parse_A parse_A parse_A parse_A parse_A parse_A parse_A Call time Unwind: match(y) Return parse_C parse_C Return … parse_C parse_C parse_C Return parse_C parse_C parse_C parse_C Return parse_A parse_A parse_A parse_A parse_A Finished time Compiler Construction 07: Bottom-up parsing � 5

  6. Memory in recursive descent code Syntax analysis • Where is the memory hidden in our parser? • We do not explicitly store and retrieve state • The programming language hides it: • When calling (returning) from a function, state is pushed onto (popped from) the computer’s stack automatically • This state includes the return address of the call site • We can also build LL(1) parsers using iterations • but then we have to implement our own stack… • The stack is needed to match beginnings and ends of productions Any production of the form A → x B y where B can contain further • instances of x and y , such as: 
 Call Expression → ( Expression ) 
 match(y) Call Call Statement → { Statement } 
 Return match(y) parse_C parse_C parse_C Comment → (* Comment *) Call Return parse_A parse_A parse_A parse_A parse_A parse_A Call Compiler Construction 07: Bottom-up parsing

  7. Top-down parsing and the syntax tree Syntax analysis LL(1) parsers generate a parse tree from top to bottom: 𝛕 Part of the syntax tree that has already been derived 𝛃 : current NT symbol 𝛃 u 2 At this point, the parser tries 𝜸 to find a derivation for 𝛃 : 
 v u 0 𝛽 u 2 → u 0 v u 2 u 0 v 1 u R u R has to be derivable from u 2 
 to complete parsing 
 initial part 
 input token stream 
 (otherwise: syntax error) of the input 
 remaining to be read token stream 
 that is already 
 derived Compiler Construction 07: Bottom-up parsing � 7

  8. Bottom-up parsing Syntax analysis Can we also construct the parse tree from bottom to top? 𝛕 We try to guess a production 𝛃 → v 1 v 2 𝛃 u 2 𝛃 u 2 v 1 v 2 u 0 u u R initial part input token stream 
 already 
 remaining to be read reduced Compiler Construction 07: Bottom-up parsing � 8

  9. General idea of bottom-up parsing Syntax analysis • Bottom-up parsing starts from the input token stream (whereas top-down starts from the grammar start symbol) • It reduces a string to the start symbol by inverting productions • trying to find a production matching the right hand side E → T + E | T 
 E ← T + E | T 
 T → i n t × T | i n t | ε T ← i n t × T | i n t | ε • Consider the input token 
 i n t × i n t + i n t T → i n t 
 stream int * int + int : i n t × T + i n t T → i n t × T 
 T + i n t T → i n t 
 • Reading the productions 
 T + T E → T 
 in reverse (from bottom T + E E → T + E 
 to top ) gives a rightmost E derivation Compiler Construction 07: Bottom-up parsing � 9

  10. The resulting parse tree Syntax analysis • A bottom-up parser traces a rightmost derivation in reverse E i n t × i n t + i n t 
 i n t × T + i n t 
 T E T + i n t 
 T + T 
 T T T + E 
 E + i n t × i n t i n t Compiler Construction 07: Bottom-up parsing � 10

  11. A simple bottom-up parsing algo Syntax analysis • Idea: split input string (token stream) into two substrings • Right substring (a string of terminal symbols) has not been examined so far • Left substring has terminals and nonterminals (generated by replacing the right side of a production by the left side) 
 I = i npu t s tri n g r epea t se l e ct a non-emp ty subs tri n g 𝛾 o f I 
 whe r e X → 𝛾 i s a p r odu cti on i n t he gr amma r if no su c h 𝛾 ex i s t s, ba cktr a ck 
 r ep l a c e one 𝛾 b y X i n I un til I == "S" /* s t a rt s y mbo l */ 
 o r a ll o t he r poss i b iliti es exhaus t ed /* e rr o r */ Compiler Construction 07: Bottom-up parsing � 11

  12. Bottom-up parsing steps Syntax analysis I = i npu t s tri n g • Initially, all input is unexamined, 
 r epea t se l e ct a non-emp ty subs tri n g 𝛾 o f I 
 written as: whe r e X → 𝛾 i s a p r odu cti on i n t he gr amma r ↑ x 1 x 2 x 3 …x n if no su c h 𝛾 ex i s t s, ba cktr a ck 
 r ep l a c e one 𝛾 b y X i n I un til I == "S" /* s t a rt s y mbo l */ 
 o r a ll o t he r poss i b iliti es exhaus t ed /* e rr o r */ Two kinds of operations: • Shift : move ↑ one place to the right AB C ↑ x y z AB C x ↑ y z • Reduce : Apply an inverse production at the right end of the left string • If A → x y is a production, then C bx y ↑ ijk C bA ↑ ijk Compiler Construction 07: Bottom-up parsing � 12

  13. Example with reductions only Syntax analysis E → T + E | T 
 T → i n t × T | i n t | ε i n t × i n t ↑ + i n t reduce T → i n t i n t × T ↑ + i n t reduce T → i n t × T reduce T → i n t T + i n t ↑ T + T ↑ reduce E → T reduce E → T + E T + E ↑ Compiler Construction 07: Bottom-up parsing � 13

  14. Example with shift-reduce parsing Syntax analysis E → T + E | T 
 ↑ i n t × i n t + i n t sh ift T → i n t × T | i n t | ε i n t ↑ × i n t + i n t sh ift i n t × ↑ i n t + i n t sh ift i n t × i n t ↑ + i n t r edu c e T → i n t i n t × T ↑ + i n t r edu c e T → i n t × T T ↑ + i n t sh ift T + ↑ i n t sh ift T + i n t ↑ r edu c e T → i n t 
 T + T ↑ r edu c e E → T T + E ↑ r edu c e E → T + E 
 E (a rriv ed a t s t a rt s y mbo l !) Compiler Construction 07: Bottom-up parsing � 14

  15. Implementing the memory Syntax analysis Idea: E → T + E | T 
 T → i n t × T | i n t | ε • Left substring can be implemented 
 by a stack • shift operating pushes a terminal symbol onto the stack • reduce pops zero or more symbols off the stack (the right- hand side of a production) and pushes a non-terminal symbol onto the stack (left-hand side of a production) stack contents input token stream parser operation: stack operation(s) [] ↑ i n t × i n t + i n t sh ift : push [ i n t ] [ i n t ] i n t ↑ × i n t + i n t sh ift : push [ × ] [ i n t , × ] i n t × ↑ i n t + i n t sh ift : push [ i n t ] [ i n t , × , i n t ] i n t × i n t ↑ + i n t r edu c e T → i n t : pop-> i n t , push[ T ] [ i n t , × , T ] i n t × i n t ↑ + i n t r edu c e T → i n t × T : pop, push[ T ] [ T ] i n t × i n t ↑ + i n t … Compiler Construction 07: Bottom-up parsing � 15

  16. Conflicts in parsing Syntax analysis Problem: • How do we decide when to shift or reduce? • Consider the step i n t ↑ × i n t + i n t • We could reduce using T → i n t giving T ↑ × i n t + i n t • A fatal mistake: No way to reduce to the start symbol E • Generic shift-reduce strategy: • If there is a matching pattern ( handle ) on the stack, reduce • Otherwise, shift • What if there is a choice (between two matching patterns)? • If it’s legal to shift or reduce, there is a shift-reduce conflict • If it is legal to reduce by two different productions, there is a reduce-reduce conflict Compiler Construction 07: Bottom-up parsing � 16

Recommend


More recommend