Related Reading Chapter 2 Grammars and Parse Trees I Programming Languages Concepts and Constructs, Ravi Sethi Defining Language Syntax Lecture 2 Formal Grammars CS631 Fall 2000 1 Lecture 2 Formal Grammars CS631 Fall 2000 2 Overview A Grammar We need a way to describe programming languages. A sentence is a noun < S > ::= < NP > < V > < NP > phrase, a verb, and a – Grammars noun phrase. – Parse Trees A noun phrase is an – Equivalent grammar notations < NP > ::= < A > < N > article and a noun. � Context Free Grammars � Backus-Naur Format A verb is… < V > ::= eats | loves | hates � Extended Backus-Naur Format An article is… < A > ::= a | the Note: On Wed we will expand on these concepts A noun is... < N > ::= dog | cat | rat Lecture 2 Formal Grammars CS631 Fall 2000 3 Lecture 2 Formal Grammars CS631 Fall 2000 4 A Parse Tree Basic Concepts < S > ✁ We use context-free grammars to define < NP > < V > < NP > language syntax. ✁ The grammar defines how to build parse < A > < N > loves < A > < N > trees; the language is the set of strings derived by some parse tree. the dog the cat Lecture 2 Formal Grammars CS631 Fall 2000 5 Lecture 2 Formal Grammars CS631 Fall 2000 6 1
Formal Definition start symbol < S > ::= < NP > < V > < NP > A grammar consists of four parts: – the set of terminals (also called tokens): the atomic a production symbols that make up the language < NP > ::= < A > < N > – the set of nonterminals: the variables representing language constructs < V > ::= eats | loves | hates – the set of productions: tree-building rules that define < A > ::= a | the possible children for each nonterminal nonterminals – the start symbol: the nonterminal that forms the root < N > ::= dog | cat | rat of any parse tree for the grammar terminals Lecture 2 Formal Grammars CS631 Fall 2000 7 Context-free Grammars Note on CFG Formal Notation ✁ Such grammars are sometimes called ✁ If you take CS517 you will see one way of context-free grammars (CFG’s): left-hand expressing CFG’s: side of each production is one nonterminal ✁ We can use any production for a given S → aSb | X nonterminal to decide what children to give X → cX | ∈ it, without looking at the rest of the tree. ✁ But in programming language studies there is a (Note: Other kinds of grammars exist: regular different notation for the same idea... grammars (weaker), context-sensitive grammars (stronger), etc.) Lecture 2 Formal Grammars CS631 Fall 2000 9 Lecture 2 Formal Grammars CS631 Fall 2000 10 BNF Example Backus-Naur Form (BNF) < exp > ::= < exp > + < exp > | < exp > * < exp > Conventions: | ( < exp > ) – nonterminals are enclosed in angle brackets | a | b | c – the symbol ::= separates the two sides of a Note that there are six productions in this grammar. production, and | separates alternatives on the It is equivalent to this: right-hand side. < exp > ::= < exp > + < exp > – The special nonterminal < empty > represents the < exp > ::= < exp > * < exp > zero-length string. < exp > ::= ( < exp > ) < exp > ::= a < exp > ::= b < exp > ::= c Lecture 2 Formal Grammars CS631 Fall 2000 11 2
Parse Trees Example: Parse tree for (a + b * c) To build a parse tree: < exp > ::= < exp > + < exp > ✁ Put the start symbol at the root. < exp > | < exp > * < exp > ✁ Add children to every nonterminal, | ( < exp > ) ( < exp > ) | a | b | c following any one of the productions for < exp > + < exp > that nonterminal in the grammar. ✁ Done when all the leaves are terminal. < exp > * < exp > a ✁ Read off leaves from left to right; that’s the b c string derived by the tree. Lecture 2 Formal Grammars CS631 Fall 2000 13 Lecture 2 Formal Grammars CS631 Fall 2000 14 Practice Exercise Compiler Note ✁ What you just did is parsing : trying to find a < exp > ::= < exp > + < exp > | < exp > * < exp > parse tree for a given string. | ( < exp > ) ✁ That’s what compilers do for every program you | a | b | c try to compile: try to build a parse tree for your Show a parse tree for each of these strings: program, using the grammar for whatever a+b language you used. a*b+c ✁ Take CS654 to learn about algorithms for doing (a+b) (a+(b)) this efficiently. ((a+b)*c Lecture 2 Formal Grammars CS631 Fall 2000 15 Lecture 2 Formal Grammars CS631 Fall 2000 16 Language Definition Practice Exercise ✁ We use grammars to define the syntax of Give a BNF grammar for each of the following languages: programming languages. ✁ The language defined by a grammar is the 1. The set of all strings consisting of 0 or more concatenated copies of the string ab . set of all strings that can be derived by some parse tree for the grammar. ✁ The set of strings is often infinite although 2. The set of all strings consisting of 0 or more a ’s followed by 0 or more b ’s. grammars are finite. Lecture 2 Formal Grammars CS631 Fall 2000 17 Lecture 2 Formal Grammars CS631 Fall 2000 18 3
Practice Exercise EBNF ✁ Additional syntax to simplify some Give a BNF grammar for each of the following languages: grammar chores: 1. The set of all strings consisting of 0 or more a ’s with a semicolon after each one. – {x} to mean zero or more repetitions of x – [x] to mean x is optional (i.e. x | < empty >) 2. The set of all strings consisting of 1 or more a ’s separated by semicolons (but not before the first or after the last). – () for grouping – | to mean a choice among alternatives 3. The set of all strings consisting of 0 or more a ’s separated – quotes around terminals, if necessary, to by semicolons (but not before the first or after the last). distinguish from all these meta-symbols Lecture 2 Formal Grammars CS631 Fall 2000 19 Lecture 2 Formal Grammars CS631 Fall 2000 20 Practice Exercise Many Other Variations ✁ BNF and EBNF ideas are widely used. Give an EBNF grammar for each of these languages. Use the EBNF extensions where possible to simplify the grammars. ✁ Exact notation differs, in spite of occasional 1. All the languages from the previous set of exercises. efforts to get uniformity. 2. The language of legal Pascal compound statements: the – Niklaus Wirth . What Can We Do About the keyword begin , followed by 0 or more statements separated Unnecessary Diversity of Notation for Syntatic by semicolons, followed by end . (Don’t worry about Definitions . Communications of the ACM , productions for the < statement > nonterminal.) November, 1977. ✁ But as long as you understand the ideas, 3. The language of legal C iteration statements using while , and do . (Don’t worry about productions for the differences in notation are easy to pick up. < expression > and < statement > nonterminals.) Lecture 2 Formal Grammars CS631 Fall 2000 21 Lecture 2 Formal Grammars CS631 Fall 2000 22 Example: Java Grammar Excerpt Example: Java Grammar continued ForInit: StatementExpressionList WhileStatement: LocalVariableDeclaration while ( Expression ) Statement ForUpdate: WhileStatementNoShortIf: StatementExpressionList while ( Expression ) StatementNoShortIf StatementExpressionList: DoStatement: StatementExpression do Statement while ( Expression ) ; StatementExpressionList , StatementExpression ForStatement: for ( ForInit opt ; Expression opt ; ForUpdate opt ) Statement Lecture 2 Formal Grammars CS631 Fall 2000 23 Lecture 2 Formal Grammars CS631 Fall 2000 24 4
Compiler Issues: AST Example Abstract Syntax Tree (AST) ✁ A tree structure used by compilers. < exp > ✁ A parse tree with nonterminals removed, ( < exp > ) + containing only what the compiler needs for code a * < exp > + < exp > generation. b c ✁ Usually, each node is an operator and each a < exp > * < exp > subtree of that node is an operand... b c But there’s no standard definition for this. It depends on the compiler. Lecture 2 Formal Grammars CS631 Fall 2000 25 Lecture 2 Formal Grammars CS631 Fall 2000 26 Compilers and Interpreters Summary ✁ We use context-free grammars to define Generates code Checks things like Converts input file for physical machine. type correctness. into a stream of language syntax. tokens for parsing. ✁ The grammar defines how to build parse Code Physical Generator Machine trees; the language is the set of strings Source Static Scanner Parser AST derived by some parse tree. Code Analyzer ✁ Different notations, same ideas: Virtual Machine Parses tokens using (Interpreter) – formal grammars a grammar; produces – Backus-Naur Form (BNF) Abstract Syntax Tree Executes the program using a simulated machine – Extended BNF (EBNF) (like the Java VM) Lecture 2 Formal Grammars CS631 Fall 2000 28 Review Questions Look at questions 2.4, 2.6, 2.9 Lecture 2 Formal Grammars CS631 Fall 2000 29 5
Recommend
More recommend