CS502: Compiler Design Syntax Analysis Manas Thakur Fall 2020
Where are we? Character stream Machine-Independent Machine-Independent Lexical Analyzer Lexical Analyzer Code Optimizer Code Optimizer B a c k e n d Intermediate representation Token stream F r o n t e n d Syntax Analyzer Code Generator Syntax Analyzer Code Generator Target machine code Syntax tree Machine-Dependent Machine-Dependent Semantic Analyzer Semantic Analyzer Code Optimizer Code Optimizer Syntax tree Target machine code Intermediate Intermediate Symbol Code Generator Code Generator Table Intermediate representation Manas Thakur CS502: Compiler Design 2
Roles of Parsing / Syntax analysis ● Read the specification given by the language implementor. ● Get help from lexer to collect tokens. ● Check if the sequence of tokens matches the specification. ● Declare successful program structure or report errors in a useful manner. ● Later: Also identify some semantic errors. Manas Thakur CS502: Compiler Design 3
Specifying the syntax ● Regular expressions are mostly not capable enough. ● Syntactic constructs specified using context-free grammars . ● The corresponding language is called a context-free language . ● CFGs subsume REs. – Then why did we use REs for scanning? ● Right tool for the right job! Manas Thakur CS502: Compiler Design 4
Contex-Free Grammar (CFG) list → list + digit list → list + digit 1. A set of terminals called tokens. list → list – digit list → list – digit list → digit Terminals are elementary symbols list → digit digit → 0 | 1 | ... | 8 | 9 digit → 0 | 1 | ... | 8 | 9 of the parsing language. 2. A set of non-terminals called variables. A non-terminal represents a set of strings of terminals. 3. A set of productions. – They define the syntactic rules. 4. A start symbol designated by a non-terminal. Manas Thakur CS502: Compiler Design 5
Productions All of the below are productions (or rules): list → list + digit list → list + digit list → list – digit list → list – digit list → digit list → digit digit → 0 | 1 | ... | 8 | 9 digit → 0 | 1 | ... | 8 | 9 right or left or body head Manas Thakur CS502: Compiler Design 6
Derivations ● A grammar derives strings by beginning with the start symbol and repeatedly replacing a non-terminal by the body of a production for that non-terminal. list → list + digit list → list + digit list → list – digit list → list – digit list → digit list → digit digit → 0 | 1 | ... | 8 | 9 digit → 0 | 1 | ... | 8 | 9 ● The above grammar derives sentences like – 3+1-0+8-2+0+1+5 – 0 ● The set of all such strings forms the language specified by the above CFG. Manas Thakur CS502: Compiler Design 7
Practice ● Write a CFG to generate strings of the form 0 n 1 n . – S --> 0S1 – S --> ε – Can also be written as: ● S --> 0S1 | ε ● Homework: – wcw r Manas Thakur CS502: Compiler Design 8
Derivations (cont.) ● Given a CFG, we can derive strings in the associated CFL by succesively replacing the non-terminals based on productions. Example derivation (x + 2 * y): goal → expr → expr op expr → id op expr → x op expr goal → expr goal → expr → x + expr expr → expr op expr | num | id expr → expr op expr | num | id → x + expr op expr id → a | b | ... | z id → a | b | ... | z → x + num op expr num → 0 | 1 | ... | 9 num → 0 | 1 | ... | 9 → x + 2 op expr op → + | - | * | / op → + | - | * | / → x + 2 * expr → x + 2 * id → x + 2 * y Manas Thakur CS502: Compiler Design 9
Leftmost derivations ● What did we do at each step in the previous derivation? – Replaced the leftmost non-terminal Example derivation (x + 2 * y): – Called a leftmost derivation – expr , goal → expr → expr op expr expr op expr , → id op expr id op expr , etc. → x op expr → x + expr are the leftmost sentential forms → x + expr op expr → x + num op expr goal → expr goal → expr → x + 2 op expr expr → expr op expr | num | id expr → expr op expr | num | id → x + 2 * expr id → a | b | ... | z id → a | b | ... | z → x + 2 * id num → 0 | 1 | ... | 9 num → 0 | 1 | ... | 9 → x + 2 * y op → + | - | * | / op → + | - | * | / Manas Thakur CS502: Compiler Design 10
Rightmost derivations ● Replace the rightmost non-terminal at each step – Called a rightmost derivation Example derivation (x + 2 * y): – expr , goal → expr expr op expr , → expr op expr expr op id , etc. → expr op id are the rightmost sentential forms → expr op y → expr * y → expr op expr * y → expr op num * y goal → expr goal → expr → expr op 2 * y expr → expr op expr | num | id expr → expr op expr | num | id → expr + 2 * y id → a | b | ... | z id → a | b | ... | z → id + 2 * y num → 0 | 1 | ... | 9 num → 0 | 1 | ... | 9 → x + 2 * y op → + | - | * | / op → + | - | * | / Manas Thakur CS502: Compiler Design 11
Formally ● → * denotes a derivation of zero or more steps ● → + denotes a derivation of one or more steps If S → * β, then β is a sentential form of the associated grammar G. ● L(G) = {w | S → + w and w consists only of terminals}; w L(G) is ∈ ● called a sentence of G. The process of discovering a derivation is called parsing . ● The output is a parse tree , which we shall see tomorrow . ● Manas Thakur CS502: Compiler Design 12
CS502: Compiler Design Syntax Analysis (Cont.) Manas Thakur Fall 2020
Parse Tree ● A pictorial representation of program derivation. expr → expr + expr | expr * expr | id expr → expr + expr | expr * expr | id id → a | b | ... | z id → a | b | ... | z ● A parse tree for x + y * z: expr expr + expr + expr expr expr id * id expr * expr expr expr x x id id id id y z y z Manas Thakur CS502: Compiler Design 14
Precedence ● Another parse tree for x+y*z: expr expr * expr * expr expr expr + expr + expr id expr expr id z id id z id id x y x y ● Operator evaluation in a left-to-right tree walk gives: (x+y)*z – Wrong answer! – Should have been: x+(y*z) Manas Thakur CS502: Compiler Design 15
The precedence problem ● Our grammar has no notion of precedence or an implied order of evaluation . ● Ideally, multiplication should be enforced before addition. ● Will the green grammar generate all the strings that could be generated by the orange grammar? expr → expr + expr | expr * expr | id expr → expr + expr | expr * expr | id id → a | b | ... | z id → a | b | ... | z expr → expr + term | term expr → expr + term | term term → term * factor | factor term → term * factor | factor factor → id factor → id id → a | b | ... | z id → a | b | ... | z ● Does it solve the problem? Manas Thakur CS502: Compiler Design 16
New derivation and parse tree expr → expr + term | term expr → expr + term | term term → term * factor | factor term → term * factor | factor factor → id factor → id id → a | b | ... | z id → a | b | ... | z expr → expr + term expr expr → expr + term * factor → expr + term * id + expr + term expr term → expr + term * z → expr + factor * z * term term * factor term term factor → expr + id * z factor → expr + y * z factor id factor factor id ! k l a w → term + y * z - e e r t id id z z → id + y * z id id t c e r r o → x + y * z C x y x y Manas Thakur CS502: Compiler Design 17
Ambiguity रोको मत जाने दो ● – Whether to stop or let go. ● Sarah gave a bath to her dog wearing a pink t-shirt. – Who was wearing the pink t-shirt? Manas Thakur CS502: Compiler Design 18
Ambiguity in grammars ● If a grammar has more than one leftmost or rightmost derivation for a single sentential form, then it is ambiguous. ● Example: <stmt> → if <expr> then <stmt> <stmt> → if <expr> then <stmt> | if <expr> then <stmt> else <stmt> | if <expr> then <stmt> else <stmt> | <other stmts> | <other stmts> ! r a m m a ● Try deriving the sentential form: r g s u o u g i – if E1 then if E2 then S1 else S2 b m A if E1 then if E1 then if E2 then if E2 then S1 S1 else else S2 S2 Manas Thakur CS502: Compiler Design 19
Resolving ambiguity ● Need to re-arrange the grammar. ● Match an else with the closest unmatched then : <stmt> → <matched> <stmt> → <matched> | <unmatched> | <unmatched> <matched> → if <expr> then <matched> else <matched> <matched> → if <expr> then <matched> else <matched> | <other stmts> | <other stmts> <unmatched> → if <expr> then <stmt> <unmatched> → if <expr> then <stmt> | if <expr> then <matched> else <unmatched> | if <expr> then <matched> else <unmatched> ● Check: if E1 then if E2 then S1 else S2 ● Not a trivial task, but comes with practice. Manas Thakur CS502: Compiler Design 20
Recommend
More recommend