context free grammars cfgs roadmap
play

Context-free grammars (CFGs) Roadmap Last time RegExp == DFA - PowerPoint PPT Presentation

Context-free grammars (CFGs) Roadmap Last time RegExp == DFA Jlex: a tool for generating (Java code for) a lexer/scanner Mainly a collection of regexp, action pairs This time CFGs, the underlying abstraction for parsers


  1. Context-free grammars (CFGs)

  2. Roadmap Last time – RegExp == DFA – Jlex: a tool for generating (Java code for) a lexer/scanner • Mainly a collection of 〈 regexp, action 〉 pairs This time – CFGs, the underlying abstraction for parsers Next week – Java CUP: a tool for generating (Java code for) a parser • Mainly a collection of 〈 CFG-rule, action 〉 pairs regexp : JLex :: CFG : Java CUP

  3. RegExps Are Great! Perfect for tokenizing a language However, they have some limitations – Can only define a limited family of languages • Cannot use a RegExp to specify all the programming constructs we need – No notion of structure Let’s explore both of these issues

  4. Limitations of RegExps Cannot handle “matching” E.g., language of balanced parentheses L () = { ( n ) n where n > 0 } No D DFA e exists f for t this l language Intuition: A given FSM only has a fixed, finite amount In of memory – For an FSM, memory = the states – With a fixed, finite amount of memory, how could an FSM remember how many “(“ characters it has seen?

  5. orem: No RegExp/DFA can describe Th Theor the language L () Proof by contradiction: • Suppose that there exists a DFA A for L () and A has N states • A has to accept the string ( N ) N N with some path q 0 q 1 …q N …q …q 2N+ 2N+1 • By the pigeonhole principle some state has to repeat: q i = q j for some i<j<N • Therefore the run q 0 q 1 …q …q i q j+ j+1 …q N …q …q 2N+ 2N+1 is also accepting (j-i) ) N ∉ L () , which is a • A accepts the string ( N-(j contradiction!

  6. Limitations of RegExps: No Structure Our Enhanced-RegExp scanner can emit a stream of tokens: X = Y + Z ID ASSIGN ID PLUS ID … but this doesn’t really enforce any order of operations

  7. The Chomsky Hierarchy LANGUAGE CLASS: power efficiency Recursively enumerable Context-Sensitive Turing machine Context-Free Happy medium? Regular FSM Noam Chomsky

  8. Context Free Grammars (CFGs) A set of (recursive) rewriting rules to generate patterns of strings Can envision a “parse tree” that keeps structure

  9. CFG: Intuition S → ‘ ( ‘ S ‘ ) ’ A rule that says that you can rewrite S to be an S surrounded by a single set of parenthesis Before applying rule After applying rule S S ( S )

  10. Context Free Grammars (CFGs) A CFG is a 4-tuple (N,Σ,P,S) • N is a set of non-terminals, e.g., A, B, S, … • Σ is the set of terminals • P is a set of production rules • S ∈ N is the initial non-terminal symbol (“start symbol”)

  11. Context Free Grammars (CFGs) Placeholder / interior nodes A CFG is a 4-tuple (N,Σ,P,S) in the parse tree • N is a set of non-terminals, e.g., A, B, S… • Σ is the set of terminals Tokens from scanner • P is a set of production rules • S (in N) is the initial non-terminal symbol Rules for deriving strings If not otherwise specified, use the non-terminal that appears on the LHS of the first production as the start

  12. Production Syntax LHS → RHS Expression: Sequence of terminals and nonterminals Single nonterminal symbol Examples: S à ‘(‘ S ‘)’ S à ε

  13. Production Shorthand Nonterm → expression S à ‘(‘ S ‘)’ S à ε Nonterm→ ε eq equivalen entl tly : Nonterm → expression S à ‘(‘ S ‘)’ | ε | ε eq equivalen entl tly : S à ‘(‘ S ‘)’ | ε Nonterm → expression | ε

  14. Derivations To derive a string: • Start by setting “ Current Sequence” to the start symbol • Repeatedly, – Find a Nonterminal X in the Current Sequence – Find a production of the form X→α – “Apply” the production: create a new “current sequence” in which α replaces X • Stop when there are no more non-terminals • This process derives a string of terminal symbols

  15. Derivation Syntax • We’ll use the symbol “ ⇒ ” for “ derives” & • We’ll use the symbol “ ⇒ ” for “ derives in one or more steps” (also written as “ ⇒ & ”) ∗ • We’ll use the symbol “ ⇒ ” for “ derives in zero or more steps” (also written as “ ⇒ ∗ ”)

  16. An Example Grammar

  17. An Example Grammar Terminals begin end semicolon assign id plus

  18. An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus

  19. An Example Grammar For readability, bold and lowercase Terminals begin Program boundary end semicolon assign id plus

  20. An Example Grammar For readability, bold and lowercase Terminals begin Program boundary end Represents “;” semicolon Separates statements assign id plus

  21. An Example Grammar For readability, bold and lowercase Terminals begin Program boundary end Represents “;” semicolon Separates statements assign id Represents “=“ in an assignment statement plus

  22. An Example Grammar For readability, bold and lowercase Terminals begin Program boundary end Represents “;” semicolon Separates statements assign id Represents “=“ in an assignment statement plus Identifier / variable name

  23. An Example Grammar For readability, bold and lowercase Terminals begin Program boundary end Represents “;” semicolon Separates statements assign id Represents “=“ in an assignment statement plus Identifier / variable name Represents “+“ operator in an expression

  24. An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus Nonterminals Prog Stmts Stmt Expr

  25. An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr

  26. An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts Stmt Expr

  27. An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt Expr

  28. An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt A single statement Expr

  29. An Example Grammar For readability, bold and lowercase Terminals begin end semicolon assign id plus For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt A single statement Expr A mathematical expression

  30. An Example Grammar For readability, bold and lowercase Terminals Defines the syntax of legal programs begin end Productions semicolon Prog → begin Stmts end assign id Stmts → Stmts semicolon Stmt plus | Stmt Stmt → id assign Expr Expr→ id | Expr plus id For readability, Italics and UpperCamelCase Nonterminals Prog Stmts Stmt Expr

  31. An Example Grammar For readability, bold and lowercase Terminals Defines the syntax of legal programs begin Program boundary end Productions Represents “;” semicolon Prog → begin Stmts end Separates statements assign id Stmts → Stmts semicolon Stmt Represents “=“ statement plus | Stmt Identifier / variable name Stmt → id assign Expr Represents “+“ expression Expr→ id | Expr plus id For readability, Italics and UpperCamelCase Nonterminals Prog Root of the parse tree Stmts List of statements Stmt A single statement Expr An expression

  32. Productions 1. Prog → begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id

  33. Productions 1. Prog → begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Derivation Sequence

  34. Productions Parse Tree 1. Prog → begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Derivation Sequence

  35. Productions Parse Tree 1. Prog → begin Stmts end 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Derivation Sequence Key terminal Nonterminal Rule used

  36. Productions Parse Tree 1. Prog → begin Stmts end Prog 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Derivation Sequence Prog Key terminal Nonterminal Rule used

  37. Productions Parse Tree 1. Prog → begin Stmts end Prog 2. Stmts → Stmts semicolon Stmt 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Derivation Sequence 1 Prog ⇒ begin Stmts end Key terminal Nonterminal Rule used

  38. Productions Parse Tree 1. Prog → begin Stmts end Prog 2. Stmts → Stmts semicolon Stmt begin Stmts end 3. | Stmt 4. Stmt → id assign Expr 5. Expr → id 6. | Expr plus id Derivation Sequence 1 Prog ⇒ begin Stmts end Key terminal Nonterminal Rule used

Recommend


More recommend