Compiler Construction Lecture 6: Top-down parsing and LL(1) parser construction 2020-01-24 Michael Engel Includes material by Jan Christian Meyer
Overview • Ambiguity of grammars revisited • Elimination of left recursion • Top-down parsing • Recursive descent parsers: structure and implementation • Table-driven LL(1) parsers • Table generation Compiler Construction 06: Top-down, LL(1) parsing � 2
Ambiguity of grammars Syntax analysis • For the compiler, it is important that each sentence in the language defined by a context-free grammar has a unique rightmost (or leftmost) derivation • A grammar in which multiple rightmost (or leftmost) derivations exist for a sentence is called an ambiguous grammar • it can produce multiple derivations and multiple parse trees • Multiple parse trees imply multiple possible meanings for a single program! ⚡ Compiler Construction 06: Top-down, LL(1) parsing � 3
Ambiguity of grammars: example Syntax analysis " dangling else "- 1 Statement → if Expr t hen Statement e l se Statement problem in 2 | if Expr t hen Statement ALGOL-like 3 | Assignment 4 | …other statements… languages (e.g. PASCAL) "else" part is optional This statement if Expr1 t hen if Expr2 t hen Assignment1 e l se Assignment2 has two distinct rightmost derivations with different behaviors: Statement Statement Expr 1 if then Statement else Statement if Expr 1 then Statement Expr 2 then Statement Assignment 2 if if Expr 2 then Statement else Statement Assignment 1 Assignment 1 Assignment 2 Compiler Construction 06: Top-down, LL(1) parsing � 4
Removing ambiguity Syntax analysis We can modify the grammar to include a rule that determined which if controls an else : 1 Statement → if Expr t hen Statement 2 | if Expr t hen WithElse e l se Statement 3 | Assignment 4 WithElse → if Expr t hen WithElse e l se WithElse 5 | Assignment This solution restricts the set of statements that can occur in the then part of an if-then-else construct • It accepts the same set of sentences as the original grammar • but ensures that each else has an unambiguous match to a specific if Compiler Construction 06: Top-down, LL(1) parsing � 5
Removing ambiguity: example Syntax analysis The modified grammar 1 Statement → if Expr t hen Statement has only one rightmost 2 | if Expr t hen WithElse e l se Statement 3 | Assignment derivation for the example 4 WithElse → if Expr t hen WithElse e l se WithElse 5 | Assignment if Expr1 t hen if Expr2 t hen Assignment1 e l se Assignment2 Rule Sentential form Statement 1 if Expr t hen Statement 2 if Expr t hen if Expr t hen WithElse e l se Statement 3 if Expr t hen if Expr t hen WithElse e l se Assignment 5 if Expr t hen if Expr t hen Assignment e l se Assignment Compiler Construction 06: Top-down, LL(1) parsing � 6
Order of derivations Syntax analysis 1 Expr → "(" Expr ")" Rightmost : 2 | Expr Op name rewrite, at each step, the rightmost nonterminal 3 | name Rule Sentential form 4 Op → + Expr 5 | - Expr Op name 2 6 | × Expr × name 6 7 | ÷ "(" Expr ")" × name 1 "(" Expr Op name ")" × name 2 "(" Expr + name ")" × name 4 Expr "(" name + name ")" × name 3 Leftmost : rewrite, at each step, the leftmost nonterminal Expr Op name( c ) Rule Sentential form Expr Expr Op name 2 "(" ")" Expr × 1 "(" Expr ")" Op name 2 "(" Expr Op name ")" Op name Expr Op name(b) 3 "(" name Op name ")" Op name 4 "(" name + name ")" Op name parse tree "(" name + name ")" × name 6 identical for both! name(a) + Compiler Construction 06: Top-down, LL(1) parsing � 7
Left factoring Syntax analysis • Parsers (and scanners) only have a limited lookahead to upcoming tokens • Example: given a production A → ab c de f X g h | ab c de f Y g h the parser is unable to choose between the two production if it can only look one character ahead • As with NFA → DFA conversion, if we can postpone the decision until it makes a difference, that works • Rewriting the grammar as A → ab c de f A’ A’ → X g h | Y g h preserves the language by adding one production to collect a common prefix shared by several other productions Compiler Construction 06: Top-down, LL(1) parsing � 8
Left recursion Syntax analysis • Let’s consider this grammar for a list of 'a’s: A → A a | a which derives the following words: A → a A → A a → aa A → A a → A aa → aaa … • The production A → A a is left recursive , the head (nonterminal symbol) always appears on the left side of the production Compiler Construction 06: Top-down, LL(1) parsing � 9
An equivalent grammar Syntax analysis • The same sequences can be generated by this grammar: A → a A’ the empty string 𝜁 returns from the A’ → a A’ | 𝜁 production It derives the following words: A → a A → a A’ → aa A’ → aa A → a A’ → aa A’ → aaa A’ → aaa … Compiler Construction 06: Top-down, LL(1) parsing � 10
Eliminating left recursion Syntax analysis • If a nonterminal has m productions that are left recursive and n productions that are not greek letters (except 𝜁 ) stand A → A 𝛽 1 | A 𝛽 2 | A 𝛽 3 | … | A 𝛽 m for arbitrary combinations of other (non-)terminals A → 𝛾 1 | 𝛾 2 | 𝛾 3 | … | 𝛾 n we can introduce A’ and rewrite the productions as (see [1]): A → 𝛾 1 A’ | 𝛾 2 A’ | 𝛾 3 A’ | … | 𝛾 n A’ A’ → 𝛽 1 A’ | 𝛽 2 A’ | 𝛽 3 A’ | … | 𝛽 m A’ | 𝜁 • This generates the same language and removes (immediate) left recursion • “Immediate” because left recursion can also happen in several steps (indirectly), e.g. in the following productions A → B x and B → A y result in A → B x → A y x Here, A again shows up on the left when derived from A Compiler Construction 06: Top-down, LL(1) parsing � 11
What can we do with CFGs now? Syntax analysis • So far, we have encountered (see also [2]) • Context-Free Grammars, their derivations and syntax trees • Ambiguous grammars, and mentioned that there’s no single, true way to disambiguate them (it depends on what we want them to stand for) • Left factoring, which always shortens the distance to the next nonterminal • Left recursion elimination, which always shifts a nonterminal to the right Compiler Construction 06: Top-down, LL(1) parsing � 12
Recursive descent parsing Syntax analysis • Example: grammar that models "if" and "while" statements: P → if COND t hen STATEMENT end | if COND t hen STATEMENT e l se STATEMENT end | wh il e COND do STATEMENT end • Let’s make it a bit simpler: P → i C t S z | i C t S e S z | w C d S z C → c S → s • Let us parse the string " ictsesz " • A top-down parser begins at the start symbol P and chooses a production: P ??? Compiler Construction 06: Top-down, LL(1) parsing � 13
Recursive descent: what next? Syntax analysis • If we can only look ahead by one token and read an " i ", we can choose between two productions: P → i C t S z | i C t S e S z • We cannot make this choice before seeing more of the token stream • Left factoring makes this problem decidable with only one character of lookahead • It generates the following grammar: P → i C t SP’ | w C d S z P’ → z | e S z C → c S → s Compiler Construction 06: Top-down, LL(1) parsing � 14
Recursive descent: what next? Syntax analysis • Now we only have one production P → i C t SP’ | w C d S z to choose from when reading an " i ": P’ → z | e S z C → c P → i C t SP’ S → s • and we can generate the parse tree equivalent to the derivation: P i t C S P’ Compiler Construction 06: Top-down, LL(1) parsing � 15
Recursive descent: going down… Syntax analysis • Recursive descent implies that we follow P → i C t SP’ | w C d S z the children of the current parse tree P’ → z | e S z node down to the leaves (which must be C → c terminal symbols) S → s • So let’s see if we can parse " ictsesz " • We follow the tree from P to its first child: P The input token sequence: ict sesz ↑ i t C S P’ • we have an "i" as lookahead the arrow indicates the parser’s position ⇒ matches the first production for P! in the token stream • Now, the remaining token stream is " ctsesz " Compiler Construction 06: Top-down, LL(1) parsing � 16
Backtrack and repeat Syntax analysis • we have an "i" as lookahead ⇒ match! P → i C t SP’ | w C d S z P’ → z | e S z • Now, the remaining token stream is C → c " ctsesz " S → s • We return (backtrack) to P to continue parsing: P The input token sequence: i ct sesz ↑ i t C S P’ • This gives us the nonterminal C • A nonterminal cannot match any token, so we need to pick another production Compiler Construction 06: Top-down, LL(1) parsing � 17
Recommend
More recommend