Compiler Construction Lecture 9: Practical parsing issues and yacc intro 2020-02-04 Michael Engel
Overview • Practical parsing issues • Error recovery • Unary operators • Handling context-sensitive ambiguity • Left versus right recursion • A quick yacc intro • Syntax of yacc grammar descriptions • yacc-lex interaction • Example Compiler Construction 09: Practical parsing, yacc � 2
Error recovery Syntax analysis • Syntax errors are common in program development • Our previous parsers have stopped parsing at the first error • Is this what a programmer would want? [2] • Prefer to find as many syntax errors as possible in each compilation • A mechanism for error recovery helps the parser to move on to a state where it can continue parsing when it encounters an error • Select one or more words that the parser can use to synchronize the input with its internal state • When the parser encounters an error, it discards input symbols until it finds a synchronizing word and then resets its internal state to one consistent with the synchronizing word Compiler Construction 09: Practical parsing, yacc � 3
Error recovery Syntax analysis • Consider a language using semicolons as statement separators • The semicolon can be used as synchronizing element: when an error occurs, the parser calls the scanner repeatedly until it finds a semicolon f oo = f un c ) 42 ; r e t u r n f oo ; • Here, a recursive-descent parser can simply discard words until it finds a semicolon and return ( fake ) success [1] • This resynchronization is more complex in an LR(1) parser: • it discards input until it finds a semicolon… • scans back down the stack to find state with valid Go t o[s, S t m t ] entry • the first such state on represents the statement that contains the error • discards entries on the stack above that state, pushes the state Go t o[s, S t m t ] onto the stack and resumes normal parsing Compiler Construction 09: Practical parsing, yacc � 4
Unary operators Syntax analysis • Classic expression grammar includes binary operators only • Algebraic notation includes unary operators • e.g., unary minus and absolute value • Other unary operators: • autoincrement ( i++ ) • autodecrement ( i -- ) • address-of ( & ) • dereference ( * ) • boolean complement ( ! ) • typecasts ( ( i n t )x ) • Adding these to the expression grammar requires some care Compiler Construction 09: Practical parsing, yacc � 5
Unary operators Syntax analysis Example: expression grammar with an absolute value operator ||x Start Start → Expr Expr → Expr + Term Expr | Expr - Term Expr Term | Term "-" Term → Term × Value Value Term | Term ÷ Value Value Factor | Value Value → " ||" Factor <num,3> Factor "||" | Factor Factor → "(" Expr ")" <name,x> | num Parse tree for || x - 3 | name Compiler Construction 09: Practical parsing, yacc � 6
Start Unary operators Expr Expr "-" Term Value Term Example: absolute value operator ||x Value Factor • Absolute value should have higher precedence than either × or ÷ <num,3> Factor "||" • However, it needs lower precedence than Factor <name,x> • this enforces evaluation of parenthetic expressions Start → Expr before application of || Expr → Expr + Term • The example grammar is still LR(1) | Expr - Term • but it does not allow to write || || x | Term Term → Term × Value • Writing this doesn’t make much sense | Term ÷ Value • but it’s a legal mathematical operation, so why not? | Value Value → " ||" Factor • This would work: ||(|| x) | Factor • Problem for other operators like (dereferencing) * Factor → "(" Expr ")" | num • **p is a common operation in C | name Compiler Construction 09: Practical parsing, yacc � 7
Unary operators Problem for other operators like * Start → Expr Expr → Expr + Term • **p is a common operation in C | Expr - Term • Solution: | Term Term → Term "*" Value • add a dereference production for Value as well: Value → "*" Value | Term ÷ Value • The resulting grammar is still LR(1) | Value Value → "*" Value • even if we replace the " × " operator in Term → Term × Value with "*" , | " ||" Factor overloading the operator "*" in the | Factor way that C does Factor → "(" Expr ")" | num • The same approach works for unary minus | name Compiler Construction 09: Practical parsing, yacc � 8
Handling context-sensitive ambiguity Syntax analysis • Using one word to represent two different meanings can create a syntactic ambiguity • Common in early programming languages (FORTRAN, PL/I, Ada) • Parentheses used to enclose both the subscript expressions of an array reference and the argument list of a subroutine or function • For the input f ee( i , j ) , the compiler cannot tell if f ee is a two- dimensional array or a procedure that must be invoked • Differentiating between these two cases requires knowledge of f ee ’s declared type • This information is not syntactically obvious • The scanner would classify f ee as a name in either case Compiler Construction 09: Practical parsing, yacc � 9
Handling context-sensitive ambiguity Syntax analysis • We can add productions that derive both subscript expressions and argument lists from Factor Factor → FunctionReference • Handling this in a classical expression grammar might | ArrayReference look like this: | "(" Expr ")" | num | name • Since the last two productions have identical right-hand sides, FunctionReference → name "(" ArgList ")" this grammar is ambiguous, which creates a reduce-reduce conflict ArrayReference in an LR(1) table builder → name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 10
Handling context-sensitive ambiguity Syntax analysis Our grammar results in an LR(1) reduce-reduce conflict • Resolving this ambiguity requires extra-syntactic knowledge • "Is name a function or an array?" Factor → FunctionReference • In a recursive-descent parser, the compiler writer can combine the | ArrayReference code for FunctionReference and | "(" Expr ")" ArrayReference | num • add the extra code required to | name check the name’s declared type FunctionReference • In a table-driven parser built with a → name "(" ArgList ")" parser generator, the solution must ArrayReference work within the framework provided → name "(" ArgList ")" by the tools Compiler Construction 09: Practical parsing, yacc � 11
Handling context-sensitive ambiguity Syntax analysis Factor → FunctionOrArrayReference Two different approaches to solve this: | "(" Expr ")" | num • Rewrite grammar to combine function | name invocation and array reference into a FunctionOrArrayReference single production → name "(" ArgList ")" • issue is deferred until a later step in translation • there, it can be resolved with information from the declarations • Scanner can classify identifiers based on their declared types • requires handshaking between scanner and parser • works as long as the language has a define-before-use rule • Rewritten in this way, the grammar is unambiguous • Since the scanner returns a distinct FunctionReference syntactic category in each case, the → f un cti on_name "(" ArgList ")" parser can distinguish the two cases FunctionOrArrayReference → a rr a y _name "(" ArgList ")" Compiler Construction 09: Practical parsing, yacc � 12
Left versus right recursion Syntax analysis • Top-down parsers need right-recursive grammars • Bottom-up parsers can accommodate either left or right recursion • Compiler writers must choose between left and right recursion in writing the grammar for a bottom-up parser – how? Stack depth criterion • Left recursion can lead to smaller stack depths • Accordingly, lower memory use, less recursions List → List e lt List → e lt List | e lt | e lt Left recursive grammar Right recursive grammar Compiler Construction 09: Practical parsing, yacc � 13
Left versus right recursion: stack depth Syntax analysis • The left-recursive grammar shifts e lt 1 onto elt 5 its stack and immediately reduces it to List elt 4 • Next, it shifts e lt 2 onto the stack and reduces elt 3 it to List and so on… elt 2 elt 1 • It proceeds until it has shifted each of the five e lt ’s onto the stack and reduced them to List List → List e lt • Thus, the stack reaches | e lt • a maximum depth of two List List e lt 5 • and an average depth of � = � 10 1 2 List e lt 4 e lt 5 6 3 List e lt 3 e lt 4 e lt 5 • The stack depth of a left-recursive List e lt 2 e lt 3 e lt 4 e lt 5 grammar depends on the grammar, List e lt 1 e lt 2 e lt 3 e lt 4 e lt 5 Left recursion not the input stream Compiler Construction 09: Practical parsing, yacc � 14
Recommend
More recommend