Context-Free Grammars 19 March 2019 OSU CSE 1
BL Compiler Structure Code Tokenizer Parser Generator string of string of abstract string of characters tokens program integers (source code) (“words”) (object code) The parser is arguably the most interesting, and most difficult, piece of the BL compiler. 19 March 2019 OSU CSE 2
Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and construct the corresponding Program object) 19 March 2019 OSU CSE 3
Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an algorithm to parse a BL program and A grammar is a set of construct the corresponding Program formation rules for strings in a language. object) 19 March 2019 OSU CSE 4
Plan for the BL Parser • Design a context-free grammar (CFG) to specify syntactically valid BL programs • Use the grammar to implement a recursive-descent parser (i.e., an A grammar is context-free algorithm to parse a BL program and if it satisfies certain construct the corresponding Program technical conditions object) described herein. 19 March 2019 OSU CSE 5
Languages • A language is a set of strings over some alphabet Σ • If L is a language, then mathematically it is a set of string of Σ 19 March 2019 OSU CSE 6
Aside: Characters vs. Tokens • In the following examples of CFGs, we deal with languages over the alphabet of individual characters (e.g., Java’s char values) Σ = character • In the BL project, we deal with languages over an alphabet of tokens (to be explained later) 19 March 2019 OSU CSE 7
Example: Real-Number Constants • Some syntactically valid real-number constants (i.e., some strings in the “language of valid real-number constants”): 37.044 615.22E16 99241. 18.E-93 19 March 2019 OSU CSE 8
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq → digit digit-seq | digit-seq digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 9
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | This is a rewrite rule (a E – digit-seq replacement rule), which → digit digit-seq | digit-seq describes how strings in the digit language may be formed. → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 10
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq A name on the left of a → digit digit-seq | digit-seq rewrite rule is called a non-terminal symbol . digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 11
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | The special CFG symbol → E – digit-seq → digit digit-seq | digit-seq means “can be rewritten as” or “can be replaced by”. digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 12
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | The special CFG symbol | E – digit-seq means “or”, i.e., there are → digit digit-seq | digit-seq multiple possible “rewrites” digit for the same non-terminal. → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 13
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq → digit digit-seq | digit-seq So this ... digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 14
CFG Rewrite Rules real-const → digit-seq . digit-seq real-const → digit-seq . digit-seq exponent real-const → digit-seq . real-const → digit-seq . exponent → E digit-seq | exponent E + digit-seq | E – digit-seq ... means exactly the same → digit digit-seq | digit-seq thing as these four separate digit rewrite rules. → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 15
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | One non-terminal symbol E – digit-seq (normally in the first rewrite → digit digit-seq | digit-seq rule) is called the digit start symbol . → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 16
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | A symbol from the alphabet E – digit-seq on the right-hand side of a → digit digit-seq | digit-seq rewrite rule is called a digit terminal symbol . → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 17
CFG Rewrite Rules real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | digit-seq . | digit-seq . exponent → E digit-seq | exponent E + digit-seq | To remember the name: terminal E – digit-seq symbols are what you end up with → digit digit-seq | digit-seq when generating strings in the digit language (see below). → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 18
Four Components of a CFG • Non-terminal symbols for this CFG: – real-const, exponent, digit-seq, digit • Terminal symbols for this CFG: – . , E , + , - , 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 • Start symbol for this CFG: – real-const • Rewrite rules for this CFG: – (see previous slides) 19 March 2019 OSU CSE 19
Derivations • A derivation of a string of terminal symbols consists of a sequence of specific rewrite-rule applications that begin with the start symbol and continue until only terminal symbols remain – A string is in the language of the CFG iff there is a derivation that leads to it • The symbol ⇒ indicates a derivation step, i.e., a specific rewrite-rule application 19 March 2019 OSU CSE 20
Example: Derivation of 5.6E10 • Begin with the start symbol: real-const ⇒ 19 March 2019 OSU CSE 21
Example: Derivation of 5.6E10 • Begin with the start symbol: real-const ⇒ • ... and pick one possible rewrite: real-const → digit-seq . digit-seq | digit-seq . digit-seq exponent | Which rewrite digit-seq . | is appropriate digit-seq . exponent to derive 5.6E10 ? 19 March 2019 OSU CSE 22
Example: Derivation of 5.6E10 • This is the first step of the derivation: real-const ⇒ digit-seq . digit-seq exponent 19 March 2019 OSU CSE 23
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent 19 March 2019 OSU CSE 24
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent • ... and pick one possible rewrite: → digit digit-seq | digit-seq digit Which rewrite is appropriate to derive 5.6E10 ? 19 March 2019 OSU CSE 25
Example: Derivation of 5.6E10 • This is the second step of the derivation: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent 19 March 2019 OSU CSE 26
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent 19 March 2019 OSU CSE 27
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent • ... and pick one possible rewrite: → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 digit 19 March 2019 OSU CSE 28
Example: Derivation of 5.6E10 • This is the third step of the derivation: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent 19 March 2019 OSU CSE 29
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent 19 March 2019 OSU CSE 30
Example: Derivation of 5.6E10 • Choose a non-terminal to rewrite: real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent • ... and pick one possible rewrite: → digit digit-seq | digit-seq digit 19 March 2019 OSU CSE 31
One Derivation of 5.6E10 real-const ⇒ digit-seq . digit-seq exponent ⇒ digit . digit-seq exponent ⇒ 5 . digit-seq exponent ⇒ 5 . digit exponent ⇒ 5 . 6 exponent ⇒ 5 . 6 E digit-seq ⇒ 5 . 6 E digit digit-seq ⇒ 5 . 6 E 1 digit-seq ⇒ 5 . 6 E 1 digit ⇒ 5 . 6 E 1 0 19 March 2019 OSU CSE 32
Recommend
More recommend