CIS 352 Parsing, Part I Jim Royer April 2, 2019 CIS 352 ❖ Parsing, Part I 1
Miss Teen South Carolina’s Famous Answer https://kellblog.com/2007/09/01/parsing-the-unparseable-miss-teen-south-carolinas-answer/ CIS 352 ❖ Parsing, Part I 2
The Syntactic Side of Languages (Again) Natural Languages stream of stream of via parsing via lexical − − − − − → − − − − − − → sentences phonemes words analysis Artificial Languages abstract stream of stream of via parsing via lexical − − − − − → − − − − − − → syntax characters tokens analysis Tokens: Variable names, numerals, operators key-words, . . . int main ( void ) { int main(void) { printf ( "hello, world \ n" ) ; printf("hello, world \ n"); return 0; return 0 ; } } CIS 352 ❖ Parsing, Part I 3
Context Free Grammars, 1 Grammars rules for organizing ◮ word-streams into sentences ◮ token-streams into abstract syntax (parse trees) Context Free Grammmars (CFGs) ◮ Terminals: concrete syntax (e.g., printf ( . . . ) ◮ Nonterminals: syntactic categories: (e.g., Noun-Phrase, key-word, . . . ) Example (Palandromes over { a , b , c } ) A :: = ǫ | a | b | c | aAa | bAb | cAc CIS 352 ❖ Parsing, Part I 4
CFGs Examples: LC P :: = C | E | B Phases Commands C :: = skip | ℓ : = E | C ; C | if B then C else C | while B do C Integer Expressons E :: = n | ! ℓ | E ⊛ E ( ⊛ ∈ { + , − , × , . . . } ) B :: = b | E ⊛ E ( ⊛ ∈ { = , < , ≥ , . . . } ) Boolean Expressons n ∈ Z = { . . . , − 3, − 2, − 1, 0, 1, 2, 3, . . . } Integers Booleans b ∈ B = { true, false } ℓ ∈ L = { x 0 , x 1 , x 2 , . . . } Locations ! ℓ ≡ the integer currently stored in ℓ x1 := 1; x2 := !x0; // Computes factorial of !x0 while (!x2>0) do x1 := (!x1*!x2); x2 := (!x2-1) CIS 352 ❖ Parsing, Part I 5
CFGs Examples: A Fragment of English � sentence � :: = � subject �� verb1 � | � subject �� verb2 �� object � � subject � :: = � article �� noun � | � pronoun � � object � :: = that � sentence � � verb1 � :: = swims | pauses | exists � verb2 � :: = believes | hopes | imagines � article � :: = a | some | the � noun � :: = lizard | truth | man � pronoun � :: = he | she | it CIS 352 ❖ Parsing, Part I 6
CFGs, 2 ◮ CFGs recursively specify a finite collection of sets of strings, syntactic categories . ◮ Each syntactic category is named by a nonterminal symbol . E.g.: � object � , � verb1 � , and � noun � . ◮ One of the nonterminals is chosen to be the start symbol ; its syntactic category is the language given by the grammar. E.g.: � sentence � . ◮ A syntactic category (named by nonterminal N ) is described by a set of productions of the form: N :: = X 1 . . . X n where each X 1 is a terminal or nonterminal (and n could be 0). E.g.: � sentence � :: = � subject �� verb1 � � sentence � :: = � subject �� verb2 �� object � � object � :: = that � sentence � CIS 352 ❖ Parsing, Part I 7
Example: Translating a regular expression to CFG Notation: X e = the nonterminal for reg. exp. e For: Add: e = a X e :: = a e = ǫ X e :: = ǫ e = ( e 1 | e 2 ) X e :: = X e 1 | X e 2 e = ( e 1 e 2 ) X e :: = X e 1 X e 2 e = ( e ′ ) ∗ X e :: = X e ′ X e | ǫ For e = ( 01 | 10 ) ∗ : X ( 01 | 10 ) ∗ :: = X 01 | 10 X ( 01 | 10 ) ∗ | ǫ X 01 | 10 :: = X 01 | X 10 X 01 :: = X 0 X 1 X 10 :: = X 1 X 0 X 0 :: = 0 X 1 :: = 1 CIS 352 ❖ Parsing, Part I 8
A Big-Step Semantics for CFG Notation: N ⇓ w means w is in N ’s syntactic category. N 1 ⇓ w 1 · · · N k ⇓ w k � � N :: = u 0 N 1 u 1 N 2 . . . N k u k w = u 0 w 1 u 1 . . . w k u k N ⇓ w � exp � :: = � exp � + � exp � � num � ⇓ 3 � num � ⇓ 4 | � exp � − � exp � � num � ⇓ 2 � exp � ⇓ 3 � exp � ⇓ 4 ( ⋆ ) | � exp � ∗ � exp � � exp � ⇓ 2 � exp � ⇓ 3 ∗ 4 (†) | � exp � / � exp � � exp � ⇓ 2 + 3 ∗ 4 | � num � | ( � exp � ) ( ⋆ ) “3*4” = “3”++“*”++“4” (†) “2+3*4” = “2”++“+”++“3*4” A dodgy grammar CIS 352 ❖ Parsing, Part I 9
Parse Trees Exp Exp � exp � :: = � exp � + � exp � | � exp � − � exp � Exp + Exp Exp * Exp | � exp � ∗ � exp � | � exp � / � exp � 2 Exp * Exp Exp + Exp 4 | � num � | ( � exp � ) 3 4 2 3 Two parses of 2 + 3 ∗ 4 Definition (Ambiguity) A CFG is abmiguous when some some string in the language has two possible parses. (Great for lawyers, not-so-great in computing.) [From a newspaper discussion of a documentary on Merle Haggard.] “Among those interviewed were his two ex-wives, Kris Kristofferson and Robert Duvall.” CIS 352 ❖ Parsing, Part I 10
Grammar Repair, 1 ( § 3.4 in Mogensen) Definition Suppose ⊕ is an operator (e.g., + , ∗ , < ). (a) ⊕ is left-associative when a ⊕ b ⊕ c = ( a ⊕ b ) ⊕ c . (E.g., − , /) (b) ⊕ is right-associative when a ⊕ b ⊕ c = a ⊕ ( b ⊕ c ) . (E.g., :, = in C) (c) ⊕ is non-associative when a ⊕ b ⊕ c is illegal. (E.g., < ) ◮ + and ∗ can be either left- or right-associative. ◮ To be consistent with − and /, we treat them as left-assoc. For rewrite to E :: = E ⊕ E ′ | E ′ left-assoc. ⊕ E :: = E ⊕ E | � num � E ′ :: = � num � E :: = E ′ ⊕ E | E ′ right-assoc. ⊕ E :: = E ⊕ E | � num � E ′ :: = � num � [What is the parse of 1 ⊕ 2 ⊕ 3 under these two grammars?] CIS 352 ❖ Parsing, Part I 11
Grammar Repair, 2 ( § 3.4 in Mogensen) Definition Operators have an ordering called precedence . In an expression a ⊕ b ⊙ c : ◮ if precedence ( ⊕ ) > precedence ( ⊙ ) , then: a ⊕ b ⊙ c = ( a ⊕ b ) ⊙ c . ◮ if precedence ( ⊕ ) < precedence ( ⊙ ) , then: a ⊕ b ⊙ c = a ⊕ ( b ⊙ c ) . ◮ if precedence ( ⊕ ) = precedence ( ⊙ ) , then: ➱ if ⊕ and ⊗ are both left-assoc., then: a ⊕ b ⊙ c = ( a ⊕ b ) ⊙ c . ➱ if ⊕ and ⊙ are both right-assoc., then: a ⊕ b ⊙ c = a ⊕ ( b ⊙ c ) . ➱ Otherwise, no standard answer. CIS 352 ❖ Parsing, Part I 12
Grammar Repair, 3 ( § 3.4 in Mogensen) � exp � :: = � exp � + � exp � | � exp � − � exp � (level 1 precedence) | � exp � ∗ � exp � | � exp � / � exp � (level 2 precedence) | � num � | ( � exp � ) (level 3 precedence) ◮ Handle left- and right-associativity as before. ◮ Each level gets its own nonterminal. ◮ Go from lowest to highest precedence levels. � exp � 1 :: = � exp � 1 + � exp � 2 | � exp � 1 − � exp � 2 | � exp � 2 � exp � 2 :: = � exp � 2 ∗ � exp � 3 | � exp � 2 / � exp � 3 | � exp � 3 � exp � 3 :: = � num � | ( � exp � 1 ) [More problems and repairs in the next homework.] CIS 352 ❖ Parsing, Part I 13
A Small-Steps Semantics for CFGs Warning: Greek letters! Notation: (a) α N β ⇒ αγβ means α N β rewrites to αγβ by � � N :: = γ applying the production N :: = γ . is in G G ⊢ α N β ⇒ αγβ (b) ⇒ ∗ = the reflexive-transitive closure of ⇒ . � sentence � ⇒ � subject � � verb2 � � object � ⇒ � article � � noun � � verb2 � � object � ⇒ the � noun � � verb2 � � object � ⇒ the man � verb2 � � object � ⇒ the man believes � object � ⇒ the man believes that � sentence � ⇒ the man believes that � subject � � verb1 � ⇒ the man believes that � article � � noun � � verb1 � ⇒ the man believes that some � noun � � verb1 � ⇒ the man believes that some lizard � verb1 � ⇒ the man believes that some lizard exists CIS 352 ❖ Parsing, Part I 14
Digression See Graham Hutton’s slides for Chapter 8 of his “Programming in Haskell” text http://www.cs.nott.ac.uk/~gmh/chapter8.ppt Also: ◮ Hutton’s “Programming in Haskell, 2/e” homepage: http://www.cs.nott.ac.uk/~gmh/book.html ◮ Hutton’s Example Parsing Library (From the 1st edition — Not GHC 8.0.1 compliant) : http://www.cs.nott.ac.uk/~gmh/Parsing.lhs ◮ Erik Meijer’s video lecture based on the Hutton’s Chapter 8 http://channel9.msdn.com/Series/ C9-Lectures-Erik-Meijer-Functional-Programming-Fundamentals/ C9-Lectures-Dr-Erik-Meijer-Functional-Programming-Fundamentals-Chapter- (Skip to time 6:05 for the beginning for the discussion of parsers.) . . . CIS 352 ❖ Parsing, Part I 15
Recommend
More recommend