CSE 3341: Principles of Programming Languages Recursive Descent Parsing Jeremy Morris 1
Parsing A grammar is a generator for a language The rules tell us how to create strings in the language A parser is a recognizer for a language Confirms or rejects a string as being in the language or not being in the language For an arbitrary CFG we can prove that the upper bound on its running time is O(n 3 ) Earley's algorithm and CYK algorithm Fortunately, if the CFG is carefully constructed, we can do much better than that LL or LR grammars 2
Top-down vs. Bottom-up parsing "Top-down" or predictive parsing (or LL parsing) Starts from the root node of the language and the left-most token. Build the parse tree "top down" by using tokens to drive which rule will be next to be expanded. Predictive parsers are most often written by hand. "Bottom-up" parsing (or LR parsing) Builds the parse tree from the leaves upward, matching a collection of nodes to rule expansions. Also starts with the left-most token, but no fixed first rule to expand. Bottom-up parsers are most often developed using a parser generator such as Bison or YACC. 3
CORE parsing practice program int Y,Z; begin Y = 20; Z = 5; Y = Y – Z; write Y; end 4
Recursive Descent An algorithm for walking an already constructed AST Top-down rather than bottom-up Useful for interpreting parsed code, printing parsed code, generating new code from parsed code Basic Idea: Create one method/procedure for each non-terminal The body of that method decides how to walk through its children based on the rules of the language Start by calling the procedure for the starting non-terminal Algorithm ends when you have walked the entire tree Never ends? Infinite loop. 5
Recursive Descent Example <if> <stmt-seq> <cond> <stmt-seq> … … … void executeIf(??) bool b = evaluateCond(??) if (b) then executeSS(??) else executeSS(??) 6
Arrays to represent parse trees? Each node in the tree → one row in array. Each row has n columns: Integer corresponding to non-terminal for the node. 1. Integer corresponding to which alternative is used to expand 2. that non-terminal Row numbers of children used 3. This is how we determine n above – maximum number of children needed for our language + 2. Disclaimer: Your instructor does not advocate the use of arrays for hand built parsers 7 in the year 2016! But you should understand how this algorithm works.
Recursive Descent Example (revisited) void executeIf( int n, int[][] pt) bool b = evaluateCond(pt[n,3], pt) if (b) then executeSS(pt[n,4], pt) else if (pt[n,2] == 2) then executeSS(pt[n,5], pt) Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 8 needs to be doing to report errors.
Recursive Descent Example (revisited) void printIf( int n, int[][] pt) print("if") printCond(pt[n,3],pt) print("then") printSS(pt[n,4], pt) if (pt[n,2] == 2) then print("else") printSS(pt[n,5], pt) print("end;") Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 9 needs to be doing to report errors.
Recursive Descent Example (again) void printAssign( int n, int[][] pt) printId(pt[n,3],pt) print(" = ") printExp(pt[n,4],pt) Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 10 needs to be doing to report errors.
Recursive Descent Example (again) void execAssign( int n, int pt[][]) int result = evalExp(pt[n,4],pt) assignIdVal(pt[n,3], result) Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 11 needs to be doing to report errors.
Recursive Descent Parsing Parsing is harder Instead of walking the tree we are building it as we go Same idea, one method for each non-terminal… …Except that now each method will write values to the table instead of reading from it Calling parse method will create an empty "node" in the tree by using the next free row in the table Requires us to keep track of rows being used (Also requires us to have a big table or grow it dynamically) Ignore this for now – there's a better approach we'll focus on once we have the idea down 12
Recursive Descent Parsing Example void parseIf( int n, int[][] pt) pt[n,1] = 8 String s = t.currentToken() // should be "if" t.nextToken() // consume the token pt[n,3] = currentRow++ parseCond(pt[n,3], pt) pt[n,4] = currentRow++ t.nextToken() // consume the "then" token parseSS(pt[n,4], pt) s = t.currentToken() if (s is "else") then t.nextToken() // consume the token pt[n,2] = 2 // indicate we're using the second expansion pt[n,5] = currentRow++ parseSS(pt[n,5],pt) else pt[n,2] = 1 // indicate we're using the first expansion t.nextToken() // why do this? t.nextToken() // and this? Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 13 needs to be doing to report errors.
Recursive Descent Parsing Are you feeling good about this code? As an algorithm it's fine, but as far as code goes it leaves a bit to be desired The code suffers from a severe lack of abstraction We're talking about trees but operating on a table Why aren't we operating on a tree? Let's talk about an approach that uses a bit more abstraction Encapsulate the data into a parse tree class Hide our operations a bit – let the parse tree class take care of details while we focus on bigger picture A dip into object-oriented design 14
Parse Tree Class Design Let's think about the interface We're going to have a tree with a cursor – a means of moving from node to node in the tree For each node we need to store: The non-terminal identity The rule alternative used in expansion of this non-terminal For the cursor we need to be able to: Move it to each child (child 1, 2 and 3) Move it back up to the parent node We need to be able to check: Is there a child? Is there a parent (i.e. are we at the root node?) 15
Parse Tree Class Design – Interface1 interface ParseTree // To get the contents of the node int getIdentity() int getAlternative() // To get the number of children int getChildCount() // To find out if it is the root boolean hasParent() // To move the cursor void moveToChild( int index) void moveToParent() 16
Recursive Descent Example (ParseTree) void printIf( ParseTree pt) print("if") pt.moveToChild(1) printCond(pt) print("then") pt.moveToParent() pt.moveToChild(2) printSS(pt) pt.moveToParent() if (pt.getAlternative() == 2) then print("else") pt.moveToChild(3) printSS(pt) pt.moveToParent() // set it back at the if node print("end;") Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 17 needs to be doing to report errors.
ParseTree Interface Design For dealing with variable assignment we need some more operations If we're at an <id> node we need the id name, value Add a few more methods to the interface: // get the Id string if we are at an id node String getIdString() // set the id numeric value if we are an id node void setIdValue( int value) // get the numeric value for an id at an id node int getIdValue() 18
Recursive Descent Example (ParseTree) void execAssign( ParseTree pt) pt.moveToChild(2) // move to the expression to evaluate int value = execExpr(pt) pt.moveToParent() pt.moveToChild(1) // move to the ID node to store the value pt.setIdValue(value) pt.moveToParent() // restore our cursor to the top of the assign Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 19 needs to be doing to report errors.
ParseTree Interface Parsing What about parsing? For parsing we need to be able to: Add nodes Set the content of nodes Need more operations to be able to do that: // add another child to the current node void addChild() // To set the contents of the node void setIdentity( int ident) void setAlternative( int alternative) 20
Recursive Descent Parsing Example (ParseTree) void parseAssign( ParseTree pt) pt.setIdentity(7) // set it to an assignment pt.setAlternative(1) // use expansion 1 pt.addChild() pt.addChild() // add two children for the assignment node pt.moveToChild(1) parseID(pt) t.nextToken() // why are we doing this? pt.moveToParent() pt.moveToChild(2) parseExpr(pt) t.nextToken() // why? pt.moveToParent() // why? Note that this pseudocode may not be complete. Specifically it lacks error checking, which is 21 needs to be doing to report errors.
Recursive Descent Parsing Okay, so we have a bit more abstraction Still not great – we can do better Let's make this all more object-oriented Right now we're treating the whole tree like an object Let's make each node an object instead Make a separate class for each non-terminal Build printing, parsing and executing logic into each non-terminal class Build the children available into each class 22
Recommend
More recommend