compiler construction
play

Compiler Construction Lecture 9: Practical parsing issues and yacc - PowerPoint PPT Presentation

Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael Engel Overview Practical parsing issues Error recovery Unary operators Handling context-sensitive ambiguity Left versus right


  1. Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael Engel

  2. Overview • Practical parsing issues • Error recovery • Unary operators • Handling context-sensitive ambiguity • Left versus right recursion • A quick yacc intro • Syntax of yacc grammar descriptions • yacc-lex interaction • Example Compiler Construction 09: Practical parsing, yacc � 2

  3. Error recovery Syntax analysis • Syntax errors are common in program development • Our previous parsers have stopped parsing at the first error • Is this what a programmer would want? [2] • Prefer to find as many syntax errors as possible in each compilation • A mechanism for error recovery helps the parser to move on to a state where it can continue parsing when it encounters an error • Select one or more words that the parser can use to synchronize the input with its internal state • When the parser encounters an error, it discards input symbols until it finds a synchronizing word and then resets its internal state to one consistent with the synchronizing word Compiler Construction 09: Practical parsing, yacc � 3

  4. Error recovery Syntax analysis • Consider a language using semicolons as statement separators • The semicolon can be used as synchronizing element: when an error occurs, the parser calls the scanner repeatedly until it finds a semicolon f oo = f un c ) 42 ; 
 r e t u r n f oo ; • Here, a recursive-descent parser can simply discard words until it finds a semicolon and return ( fake ) success [1] • This resynchronization is more complex in an LR(1) parser: • it discards input until it finds a semicolon… • scans back down the stack to find state with valid Go t o[s, S t m t ] entry • the first such state on represents the statement that contains the error • discards entries on the stack above that state, pushes the state 
 Go t o[s, S t m t ] onto the stack and resumes normal parsing Compiler Construction 09: Practical parsing, yacc � 4

  5. Unary operators Syntax analysis • Classic expression grammar includes binary operators only • Algebraic notation includes unary operators • e.g., unary minus and absolute value • Other unary operators: • autoincrement ( i++ ) • autodecrement ( i -- ) • address-of ( & ) • dereference ( * ) • boolean complement ( ! ) • typecasts ( ( i n t )x ) • Adding these to the expression grammar requires some care Compiler Construction 09: Practical parsing, yacc � 5

  6. Unary operators Syntax analysis Example: expression grammar with an absolute value operator ||x Start Start → Expr 
 Expr → Expr + Term 
 Expr | Expr - Term 
 Expr Term | Term 
 "-" Term → Term × Value 
 Value Term | Term ÷ Value 
 Value Factor | Value 
 Value → " ||" Factor 
 <num,3> Factor "||" | Factor 
 Factor → "(" Expr ")" 
 <name,x> | num 
 Parse tree for || x - 3 | name Compiler Construction 09: Practical parsing, yacc � 6

  7. Start Unary operators Expr Expr "-" Term Value Term Example: absolute value operator ||x Value Factor • Absolute value should have higher precedence than either × or ÷ <num,3> Factor "||" • However, it needs lower precedence than Factor <name,x> • this enforces evaluation of parenthetic expressions Start → Expr 
 before application of || Expr → Expr + Term 
 • The example grammar is still LR(1) | Expr - Term 
 • but it does not allow to write || || x | Term 
 Term → Term × Value 
 • Writing this doesn’t make much sense | Term ÷ Value 
 • but it’s a legal mathematical operation, so why not? | Value 
 Value → " ||" Factor 
 • This would work: ||(|| x) | Factor 
 • Problem for other operators like (dereferencing) * Factor → "(" Expr ")" 
 | num 
 • **p is a common operation in C | name Compiler Construction 09: Practical parsing, yacc � 7

  8. Unary operators Problem for other operators like * Start → Expr 
 Expr → Expr + Term 
 • **p is a common operation in C | Expr - Term 
 • Solution: | Term 
 Term → Term "*" Value 
 • add a dereference production for Value as well: Value → "*" Value | Term ÷ Value 
 • The resulting grammar is still LR(1) | Value 
 Value → "*" Value 
 • even if we replace the " × " operator 
 in Term → Term × Value with "*" , | " ||" Factor 
 overloading the operator "*" in the 
 | Factor 
 way that C does Factor → "(" Expr ")" 
 | num 
 • The same approach works for unary minus | name Compiler Construction 09: Practical parsing, yacc � 8

  9. Handling context-sensitive ambiguity Syntax analysis • Using one word to represent two different meanings can create a syntactic ambiguity • Common in early programming languages (FORTRAN, PL/I, Ada) • Parentheses used to enclose both the subscript expressions of an array reference and the argument list of a subroutine or function • For the input f ee( i , j ) , the compiler cannot tell if f ee is a two- dimensional array or a procedure that must be invoked • Differentiating between these two cases requires knowledge of f ee ’s declared type • This information is not syntactically obvious • The scanner would classify f ee as a name in either case Compiler Construction 09: Practical parsing, yacc � 9

  10. Handling context-sensitive ambiguity Syntax analysis • We can add productions that derive both subscript expressions and argument lists from Factor Factor → FunctionReference 
 • Handling this in a classical 
 expression grammar might 
 | ArrayReference 
 look like this: | "(" Expr ")" 
 | num 
 | name 
 • Since the last two productions 
 have identical right-hand sides, 
 FunctionReference 
 → name "(" ArgList ")" 
 this grammar is ambiguous, which 
 creates a reduce-reduce conflict 
 ArrayReference 
 in an LR(1) table builder → name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 10

  11. Handling context-sensitive ambiguity Syntax analysis Our grammar results in an LR(1) reduce-reduce conflict • Resolving this ambiguity requires extra-syntactic knowledge • "Is name a function or an array?" Factor → FunctionReference 
 • In a recursive-descent parser, the 
 compiler writer can combine the 
 | ArrayReference 
 code for FunctionReference and 
 | "(" Expr ")" 
 ArrayReference | num 
 • add the extra code required to 
 | name 
 check the name’s declared type FunctionReference 
 • In a table-driven parser built with a 
 → name "(" ArgList ")" 
 parser generator, the solution must 
 ArrayReference 
 work within the framework provided 
 → name "(" ArgList ")" by the tools Compiler Construction 09: Practical parsing, yacc � 11

  12. Handling context-sensitive ambiguity Syntax analysis Factor → FunctionOrArrayReference 
 Two different approaches to solve this: | "(" Expr ")" 
 | num 
 • Rewrite grammar to combine function 
 | name 
 invocation and array reference into a 
 FunctionOrArrayReference 
 single production → name "(" ArgList ")" • issue is deferred until a later step in translation • there, it can be resolved with information from the declarations • Scanner can classify identifiers based on their declared types • requires handshaking between scanner and parser • works as long as the language has a define-before-use rule • Rewritten in this way, the grammar is unambiguous • Since the scanner returns a distinct 
 FunctionReference 
 syntactic category in each case, the 
 → f un cti on_name "(" ArgList ")" 
 parser can distinguish the two cases FunctionOrArrayReference 
 → a rr a y _name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 12

  13. Left versus right recursion Syntax analysis • Top-down parsers need right-recursive grammars • Bottom-up parsers can accommodate either left or right recursion • Compiler writers must choose between left and right recursion in writing the grammar for a bottom-up parser – how? 
 Stack depth criterion • Left recursion can lead to smaller stack depths • Accordingly, lower memory use, less recursions List → List e lt 
 List → e lt List 
 | e lt | e lt Left recursive grammar Right recursive grammar Compiler Construction 09: Practical parsing, yacc � 13

  14. Left versus right recursion: stack depth Syntax analysis • The left-recursive grammar shifts e lt 1 onto elt 5 its stack and immediately reduces it to List elt 4 • Next, it shifts e lt 2 onto the stack and reduces elt 3 it to List and so on… elt 2 elt 1 • It proceeds until it has shifted each of the five e lt ’s onto the stack and reduced them to List List → List e lt 
 • Thus, the stack reaches | e lt • a maximum depth of two List 
 List e lt 5 
 • and an average depth of � = � 10 1 2 List e lt 4 e lt 5 
 6 3 List e lt 3 e lt 4 e lt 5 
 • The stack depth of a left-recursive 
 List e lt 2 e lt 3 e lt 4 e lt 5 
 grammar depends on the grammar, 
 List e lt 1 e lt 2 e lt 3 e lt 4 e lt 5 Left recursion not the input stream Compiler Construction 09: Practical parsing, yacc � 14

Recommend


More recommend