Outline Introduction to Parsing • Regular languages revisited Ambiguity and Syntax Errors • Parser overview • Context-free grammars (CFG’s) • Derivations • Ambiguity • Syntax errors Compiler Design 1 (2011) 2 Languages and Automata Limitations of Regular Languages • Formal languages are very important in CS Intuition: A finite automaton that runs long enough must repeat states – Especially in programming languages • A finite automaton cannot remember # of times it has visited a particular state • Regular languages • because a finite automaton has finite memory – The weakest formal languages widely used – Only enough to store in which state it is – Many applications – Cannot count, except up to a finite limit • Many languages are not regular • We will also study context-free languages • E.g., language of balanced parentheses is not regular: { ( i ) i | i ≥ 0} Compiler Design 1 (2011) Compiler Design 1 (2011) 3 4
The Functionality of the Parser Example • Input: sequence of tokens from lexer • If-then-else statement if (x == y) the n z =1; e lse z = 2; • Output: parse tree of the program • Parser input IF (ID == ID) T HEN ID = INT ; ELSE ID = INT ; • Possible parser output IF-T HEN-ELSE == = = ID INT ID ID ID INT Compiler Design 1 (2011) Compiler Design 1 (2011) 5 6 Comparison with Lexical Analysis The Role of the Parser • Not all sequences of tokens are programs . . . • . . . Parser must distinguish between valid and Phase Input Output invalid sequences of tokens Lexer Sequence of Sequence of • We need characters tokens – A language for describing valid sequences of tokens Parser Sequence of Parse tree – A method for distinguishing valid from invalid tokens sequences of tokens Compiler Design 1 (2011) Compiler Design 1 (2011) 7 8
Context-Free Grammars CFGs (Cont.) • Many programming language constructs have a • A CFG consists of recursive structure – A set of terminals T – A set of non-terminals N • A STMT is of the form – A start symbol S (a non-terminal) if COND then STMT else STMT , or – A set of productions while COND do STMT , or … Assuming X ∈ N the productions are of the form • Context-free grammars are a natural notation X → ε , or for this recursive structure X → Y 1 Y 2 ... Y n where Y i N ∪ T ∈ Compiler Design 1 (2011) Compiler Design 1 (2011) 9 10 Notational Conventions Examples of CFGs • In these lecture notes A fragment of our example language (simplified): – Non-terminals are written upper-case – Terminals are written lower-case STMT → if COND then STMT else STMT – The start symbol is the left-hand side of the first while COND do STMT ⏐ production id = int ⏐ Compiler Design 1 (2011) Compiler Design 1 (2011) 11 12
Examples of CFGs (cont.) The Language of a CFG Grammar for simple arithmetic expressions: Read productions as replacement rules: E → E * E X → Y 1 ... Y n Means X can be replaced by Y 1 ... Y n E + E ⏐ X → ε ( E ) ⏐ Means X can be erased (replaced with empty string) id ⏐ Compiler Design 1 (2011) Compiler Design 1 (2011) 13 14 Key Idea The Language of a CFG (Cont.) (1) Begin with a string consisting of the start More formally, we write symbol “S” → L L L L L X X X X X Y Y X X (2) Replace any non-terminal X in the string by − + 1 1 1 1 1 i n i m i n a right-hand side of some production if there is a production → L → X Y Y L X Y Y 1 n 1 i m (3) Repeat (2) until there are no non-terminals in the string Compiler Design 1 (2011) Compiler Design 1 (2011) 15 16
The Language of a CFG (Cont.) The Language of a CFG Write Let G be a context-free grammar with start ∗ symbol S . Then the language of G is: → L L X X Y Y { } 1 n 1 m if → ∗ K K | and every is a terminal a a S a a a 1 1 → → → n n i L L L L X X Y Y 1 1 n m in 0 or more steps Compiler Design 1 (2011) Compiler Design 1 (2011) 17 18 Terminals Examples • Terminals are called so because there are no L(G) is the language of the CFG G rules for replacing them { } i i ≥ Strings of balanced parentheses i ( ) | 0 • Once generated, terminals are permanent Two grammars: • Terminals ought to be tokens of the language → → ( ) ( ) S S S S OR → ε ε | S Compiler Design 1 (2011) Compiler Design 1 (2011) 19 20
Example Example (Cont.) A fragment of our example language (simplified): Some elements of the our language STMT → if COND then STMT id = int if COND then STMT else STMT ⏐ if (id == id) then id = int else id = int while COND do STMT ⏐ while (id != id) do id = int id = int ⏐ while (id == id) do while (id != id) do id = int COND → (id == id) if (id != id) then if (id == id) then id = int else id = int (id != id) ⏐ Compiler Design 1 (2011) Compiler Design 1 (2011) 21 22 Arithmetic Example Notes Simple arithmetic expressions: The idea of a CFG is a big step. But: → ∗ E E+E | E E | (E) | id Some elements of the language: • Membership in a language is just “yes” or “no”; we also need the parse tree of the input id id + id ∗ • Must handle errors gracefully (id) id id ∗ ∗ (id) id id (id) • Need an implementation of CFG’s (e.g., yacc) Compiler Design 1 (2011) Compiler Design 1 (2011) 23 24
More Notes Derivations and Parse Trees • Form of the grammar is important A derivation is a sequence of productions – Many grammars generate the same language S → → → L L L – Parsing tools are sensitive to the grammar A derivation can be drawn as a tree Note : Tools for regular languages (e.g., lex/ML-Lex) – Start symbol is the tree’s root are also sensitive to the form of the regular → – For a production add children L L X Y Y Y Y expression, but this is rarely a problem in practice 1 1 n n to node X Compiler Design 1 (2011) Compiler Design 1 (2011) 25 26 Derivation Example Derivation Example (Cont.) • Grammar → ∗ E E E+E | E E | (E) | id E → E+E • String E + E → ∗ ∗ E E+E id id + id → ∗ id E + E E * E id → ∗ id id + E id id → ∗ id id + id Compiler Design 1 (2011) Compiler Design 1 (2011) 27 28
Derivation in Detail (1) Derivation in Detail (2) E E E + E E E → E+E Compiler Design 1 (2011) Compiler Design 1 (2011) 29 30 Derivation in Detail (3) Derivation in Detail (4) E E E E E + E E + E → E+E → E+E → ∗ E E+E E * E E * E → ∗ E E E + → ∗ id E + E id Compiler Design 1 (2011) Compiler Design 1 (2011) 31 32
Derivation in Detail (5) Derivation in Detail (6) E E E E → E+E → E+E E + E E + E → ∗ E E+E → ∗ E E+E → ∗ id E + E E * E E * E id → ∗ id E + E → ∗ id id + E → ∗ id id + E id id id id → ∗ id id + id Compiler Design 1 (2011) Compiler Design 1 (2011) 33 34 Notes on Derivations Left-most and Right-most Derivations • A parse tree has • The example is a left-most derivation – Terminals at the leaves E – At each step, replace the – Non-terminals at the interior nodes left-most non-terminal → E+E • An in-order traversal of the leaves is the • There is an equivalent → E+id original input notion of a right-most → ∗ derivation E E + id • The parse tree shows the association of → ∗ E id + id operations, the input string does not → ∗ id id + id Compiler Design 1 (2011) Compiler Design 1 (2011) 35 36
Right-most Derivation in Detail (1) Right-most Derivation in Detail (2) E E E + E E E → E+E Compiler Design 1 (2011) Compiler Design 1 (2011) 37 38 Right-most Derivation in Detail (3) Right-most Derivation in Detail (4) E E E E E + E E + E → E+E → E+E → E+id id E * E id → E+ id → ∗ E E + id Compiler Design 1 (2011) Compiler Design 1 (2011) 39 40
Right-most Derivation in Detail (5) Right-most Derivation in Detail (6) E E E E → E+E → E+E E + E E + E → E+id → E+id → ∗ E E + id E * E id E * E id → ∗ E E + id → ∗ E id + id → ∗ E id + id id id id → ∗ id id + id Compiler Design 1 (2011) Compiler Design 1 (2011) 41 42 Derivations and Parse Trees Summary of Derivations • Note that right-most and left-most • We are not just interested in whether derivations have the same parse tree s ∈ L(G) – We need a parse tree for s • The difference is just in the order in which branches are added • A derivation defines a parse tree – But one parse tree may have many derivations • Left-most and right-most derivations are important in parser implementation Compiler Design 1 (2011) Compiler Design 1 (2011) 43 44
Recommend
More recommend