top down parsing
play

Top-Down Parsing Slides modified from Louden Book and Dr. Scherger - PowerPoint PPT Presentation

Compiler Design and Construction Top-Down Parsing Slides modified from Louden Book and Dr. Scherger Top Down Parsing A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation. Such an


  1. Compiler Design and Construction Top-Down Parsing Slides modified from Louden Book and Dr. Scherger

  2. Top Down Parsing  A top-down parsing algorithm parses an input string of tokens by tracing out the steps in a leftmost derivation.  Such an algorithm is called top-down because the implied traversal of the parse tree is a preorder traversal. 2 Top Down Parsing COSC 4353

  3. Parsing  A top-down parser “discovers” the parse tree by starting at the root (start symbol) and expanding (predict) downward in a depth-first manner  They predict the derivation before the matching is done  A bottom-up parser starts at the leaves (terminals) and determines which production generates them. Then it determines the rules to generate their parents and so-on, until reaching root (S)

  4. Parsing Example Consider the following Grammar <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l  Input: begin SimpleStmt; SimpleStmt; end $

  5. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  6. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  7. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  8. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  9. Top-down Parsing Example Input: begin SimpleStmt; SimpleStmt; end $ <program> begin <stmts> end $ SimpleStmt ; <stmts> SimpleStmts ; <stmts> l <program>  begin <stmts> end $ <stmts>  SimpleStmt ; <stmts> <stmts>  begin <stmts> end ; <stmts> <stmts>  l

  10. Two Kinds of Top Down Parsers  Predictive parsers that try to make decisions about the structure of the tree below a node based on a few lookahead tokens (usually one!).  This means that only 1 (or k) rules can expand on given terminal  This is a weakness, since little program structure has been seen before predictive decisions must be made.  Backtracking parsers that solve the lookahead problem by backtracking if one decision turns out to be wrong and making a different choice.  But such parsers are slow (exponential time in general). 10 Top Down Parsing COSC 4353

  11. Top Down Parsers (cont.)  Fortunately, many practical techniques have been developed to overcome the predictive lookahead problem, and the version of predictive parsing called recursive-descent is still the method of choice for hand-coding, due to its simplicity.  But because of the inherent weakness of top-down parsing, it is not a good choice for machine-generated parsers. Instead, more powerful bottom-up parsing methods should be used (Chapter 5). 11 Top Down Parsing COSC 4353

  12. Recursive Descent Parsing  Simple, elegant idea:  Use the grammar rules as recipes for procedure code.  Each non-terminal (lhs) corresponds to a procedure.  Each appearance of a terminal in the rhs of a rule causes a token to be matched.  Each appearance of a non-terminal corresponds to a call of the associated procedure. 12 Top Down Parsing COSC 4353

  13. Recursive Descent Example Grammar rule: factor  ( exp ) | number Code: void factor(void) { if (token == number ) match( number ); else { match(‘(‘); exp(); match(‘)’); } } 13 Top Down Parsing COSC 4353

  14. Recursive Descent Example, (cont.) Note how lookahead is not a problem in this example: if the token is number , go one way, if the token is ‘(‘ go the other, and if the token is neither, declare error: void match(Token expect) { if (token == expect) getToken(); else error(token,expect); } 14 Top Down Parsing COSC 4353

  15. Recursive Descent Example (cont.) A recursive-descent procedure can also compute values or syntax trees: int factor(void) { if (token == number ) { int temp = atoi(tokStr); match( number ); return temp; } else { match(‘(‘); int temp = exp(); match(‘)’); return temp; } } 15 Top Down Parsing COSC 4353

  16. Errors in Recursive Descent Are Tricky to Handle:  If an error occurs, we must somehow gracefully exit possibly many recursive calls.  Best solution: use exception handling to manage stack unwinding (which C doesn’t have!).  But there are worse problems:  left recursion doesn’t work! 16 Top Down Parsing COSC 4353

  17. Left recursion is impossible! exp  exp addop term | term void exp(void) { if (token == ??) { exp(); // uh, oh!! addop(); term(); } else term(); } 17 Top Down Parsing COSC 4353

  18. Review on EBNF 18 Top Down Parsing COSC 4353

  19. Extra Notation:  So far: Backus-Naur Form (BNF)  Metasymbols are |    Extended BNF (EBNF):  New metasymbols […] and {…}   largely eliminated by these 19 Top Down Parsing COSC 4353

  20. EBNF Metasymbols:  Brackets […] mean “optional” (like ? in regular expressions):  exp  term ‘ | ’ exp | term becomes: exp  term [ ‘ | ’ exp ]  if-stmt  if ( exp ) stmt | if ( exp ) stmt else stmt becomes: if-stmt  if ( exp ) stmt [ else stmt ]  Braces {…} mean “repetition” (like * in regexps - see next slide) 20 Top Down Parsing COSC 4353

  21. Braces in EBNF  Replace only left-recursive repetition:  exp  exp + term | term becomes: exp  term { + term }  Left associativity still implied  Watch out for choices:  exp  exp + term | exp - term | term is not the same as exp  term { + term } | term { - term } 21 Top Down Parsing COSC 4353

  22. Simple Expressions in EBNF exp  term { addop term } addop  + | - term  factor { mulop factor } mulop  * factor  ( exp ) | number 22 Top Down Parsing COSC 4353

  23. Left recursion is impossible! exp  exp addop term | term void exp(void) { if (token == ??) { exp(); // uh, oh!! addop(); term(); } else term(); } 23 Top Down Parsing COSC 4353

  24. EBNF to the rescue! exp  term { addop term } void exp(void) { term(); while (token is an addop ) { addop(); term(); } } 24 Top Down Parsing COSC 4353

  25. This code can even left associate: int exp(void) Left associative tells us that 5-7+2 = ? { int temp = term(); -4 or 0 while (token == ‘+’ || token == ‘ - ’) if (token == ‘+’) { match(‘+’); temp += term();} else { match(‘ - ’); temp -= term();} return temp; } 25 Top Down Parsing COSC 4353

  26. Note that right recursion/assoc. is not a problem: Right-associative tells us that exp  term [ addop exp ] 5*2^2 = ? 20 or 100 void exp(void) Or a = 5; { term(); a=b=2 a ?= 2 or 5 if (token is an addop ) { addop(); exp(); } } 26 Top Down Parsing COSC 4353

  27. Non-Recursive Top Down Parsing 27 Top Down Parsing COSC 4353

  28. Step 1: Make DFA-like Transition Diagrams  One can represent the actions of a T E’ E 0 1 2 predictive parser with a transition diagram for each nonterminal of the + T E’ 3 4 5 6 E' grammar. For example, lets draw the  diagrams for the following grammar: 6 F T’ 7 8 9 T E --> T E' E' -->  | + T E' F T’ * 10 11 13 12 T’ T --> F T'  T' -->  | * F T' 13 F --> id | ( E ) E ) ( 17 18 16 F 15 id 19 28 Top Down Parsing COSC 4353

  29. Top Down Parsing  To traverse an edge labeled with T E’ E 0 1 2 a nonterminal the parser goes to the starting state of the diagram + T E’ 3 4 5 6 E' for that nonterminal and returns  to the original diagram when it has reached the end state of that 6 nonterminal. F T’ 7 8 9 T  The parser has a stack to keep F T’ * 10 11 13 12 T’ track of these actions.   For example, to traverse the T -edge 13 from state 0 to state 1, the parser puts state 1 on the top of the stack, E ) ( traverses the T -diagram from state 17 18 16 F 15 7 to state 9 and then goes to state id 1 after popping it off the stack. 19 29 Top Down Parsing COSC 4353

  30. Top Down Parsing  An edge labeled with a T E’ E 0 1 2 terminal can be traversed + T E’ when the current input token 3 4 5 6 E'  equals that terminal: 6  When such an edge is traversed the current input token is F T’ 7 8 9 T replaced with the next input token. F T’ * 10 11 13 12 T’  For example, the + -edge from  state 3 to state 4 can be 13 traversed when the parser is in state 3 and the input token is + : E ) ( 17 18 16 F 15 traversing the edge will replace id the + token with the next token. 19 30 Top Down Parsing COSC 4353

Recommend


More recommend