Grammars & Parsing Lecture 12 CS 2112 – Fall 2018
Motivation The cat ate the rat. Not all sequences of words are legal sentences The cat ate the rat slowly. § The ate cat rat the The small cat ate the big rat slowly. How many legal sentences are there? The small cat ate the big rat on the mat slowly. How many legal programs are there? The small cat that sat in the hat ate the big rat on the mat slowly. Are all Java programs that compile legal programs? The small cat that sat in the hat ate the big rat on the mat slowly, then How do we know what programs got sick. are legal? … http://java.sun.com/docs/books/jls/third_edition/html/syntax.html � 2
A Grammar Sentence ::= Noun Verb Noun Grammar: set of rules for generating sentences in a language Noun ::= boys | girls | bunnies Verb ::= like | see White space between words does not matter Our sample grammar has these The words boys, girls, bunnies, like, rules: see are called tokens or terminals § A Sentence can be a Noun followed by a Verb followed by a Noun The words Sentence, Noun, Verb are called syntactic classes or § A Noun can be ‘boys’ or ‘girls’ or ‘bunnies’ nonterminals § A Verb can be ‘like’ or ‘see’ This is a very boring grammar Examples of Sentence: because the set of Sentences is finite (exactly 18) § boys see bunnies § bunnies like girls § … � 3
A Recursive Grammar Sentence ::= Sentence and Sentence Examples of Sentences in this language: | Sentence or Sentence § boys like girls | Noun Verb Noun Noun ::= boys | girls | bunnies § boys like girls and girls like bunnies ::= like | see Verb § boys like girls and girls like bunnies and girls like bunnies § boys like girls and girls like This grammar is more interesting bunnies and girls like bunnies than the last one because the set of and girls like bunnies Sentences is infinite § ... What makes this set infinite? Answer: § Recursive definition of Sentence � 4
Detour What if we want to add a period at the end of every sentence? Sentence ::= Sentence and Sentence . | Sentence or Sentence . | Noun Verb Noun . Noun ::= … Does this work? No! This produces sentences like: girls like boys . and boys like bunnies . . Sentence Sentence Sentence � 5
Sentences with Periods TopLevelSentence ::= Sentence . Add a new rule that adds a Sentence ::= Sentence and Sentence period only at the end of the sentence. | Sentence or Sentence | Noun Verb Noun The tokens here are the 7 Noun ::= boys | girls | bunnies words plus the period (.) Verb ::= like | see This grammar is ambiguous: boys like girls and girls like boys or girls like bunnies � 6
Grammar for Simple Expressions E ::= integer | ( E + E ) Here are some legal expressions: § 2 § (3 + 34) Simple expressions: § ((4+23) + 89) § An E can be an integer. § ((89 + 23) + (23 + (34+12))) § An E can be ‘(’ followed by an E followed by ‘+’ followed by an E followed by ‘)’ Here are some illegal expressions: Set of expressions defined by this grammar is an inductively-defined § (3 set § 3 + 4 § Is the language finite or infinite? § Do recursive grammars always The tokens in this grammar are yield infinite languages? (, +, ), and any integer � 7
Parsing Grammars can be used in two Example: Show that ways ((4+23) + 89) is a valid expression E by § A grammar defines a building a parse tree language (i.e., the set of properly structured sentences ) E § A grammar can be used to parse a sentence (thus, checking if the sentence is in the language ) ( E + E ) 89 To parse a sentence is to build ( E + E ) a parse tree § This is much like diagramming a sentence 4 23 � 8
Ambiguity Grammar is ambiguous if some 2 + 3 * 5 strings have more than one parse tree Example: arithmetic expressions without precedence: E E E → n | E + E E + E E * E | E * E | ( E ) E * E E + E 2 3 5 2 3 5 � 9
Precedence Ambiguities resulting from not handling precedence can be handled by introducing extra levels of nonterminals. 2 + 3 * 5 E (expr) → T | T + E E T (term) → F | F * T T + E F (factor) → n | ( E ) F T 2 F * T 3 F Only one parse tree! 5 � 10
Recursive Descent Parsing Idea: Use the grammar to design a recursive program to check if a sentence is in the language To parse an expression E, for instance § We look for each terminal (i.e., each token ) § Each nonterminal (e.g., E) can handle itself by using a recursive call The grammar tells how to write the program! A recognizer : boolean parseE( ) { if (first token is an integer) return true; if (first token is ‘(‘) { scan past ‘(‘ token; parseE( ); scan past ‘+’ token; parseE( ); scan past ‘)’ token; return true; } return false; } � 11
Abstract Syntax Trees vs. Parse Trees Result of parsing: often a data structure representing the input. Parse tree has information we don’t need, e.g. parentheses. Abstract syntax tree Parse tree / concrete syntax tree E + T + E 2 * F T 3 5 2 F * T new BinaryOp(TIMES, new BinaryOp(PLUS, 3 F new Num(2), � 12 new Num(3)), new Num(5)) 5
Java Code for Parsing E public static ExprNode parseE(Scanner scanner) { if (scanner.hasNextInt()) { int data = scanner.nextInt(); return new Node(data); } check(scanner, ‘(‘); left = parseE(scanner); check(scanner, ‘+’); right = parseE(scanner); check(scanner, ‘)’); return new BinaryOpNode(PLUS, left, right); } � 13
Responding to Invalid Input Parsing does two things: § checks for validity (is the input a valid sentence?) § constructs the parse tree (usually called an AST or abstract syntax tree) Q: How should we respond to invalid input? A: Throw an exception with as much information for the user as possible § the nature of the error § approximately where in the input it occurred � 14
The associativity problem Top-down parsing works well with right-recursive grammars (e.g., E (expr) → T | T + E T (term) → F | F * T F (factor) → n | ( E ) Problem: leads to right-associative operators: + 1 + 1 + 2 + 3 : 2 3 � 15
Reassociation Trick: rewrite right-recursive rules to use Kleene star : E (expr) → T | T + E becomes E → T (+ T) * <--- “0 or more repetitions of + T” Recursion becomes a loop: public static Expr parseE() { Expr e = parseT(); while (peek() is “+”)) { consume(“+”); e = new BinaryOpNode(PLUS, e, parseT()); } return e; } � 16
Using a Parser to Generate Code We can modify the parser so Method parseE can generate code in a recursive way: that it generates stack code to evaluate arithmetic § For integer i, it returns string expressions: “PUSH ” + i + “\n” PUSH 2 § For (E1 + E2), 2 STOP w Recursive calls for E1 and E2 return code strings c1 and c2, respectively PUSH 2 (2 + 3) w Return c1 + c2 + “ADD\n” PUSH 3 § Top-level method appends a ADD STOP command STOP Goal: Modify parseE to return a string containing stack code for expression it has parsed � 17
Does Recursive Descent Always Work? No – some grammars cannot Sometimes recursive descent be used with recursive descent is hard to use § A trivial example (causes § There are more powerful infinite recursion): parsing techniques (not covered in this course) S ::= b | Sa Can rewrite grammar Nowadays, there are automated parser and S ::= b | bA tokenizer generators A ::= a | aA § you write down the grammar, it produces the parser and tokenizer automatically § Many based on LR parsing , which can handle a larger class of grammars. � 18
Exercises Write a grammar and recursive-descent parser for palindromes: mom dad I prefer pi race car A man, a plan, a canal: Panama murder for a jar of red rum sex at noon taxes strings of the form A n B n for some n ≥ 0: AB AABB AAAAAAABBBBBBB Java identifiers: a letter, followed by any number of letters or digits decimal integers: an optional minus sign (–) followed by one or more digits 0-9 � 19
Recommend
More recommend