Compiler Design Spring 2018 3.3 Top-down parsing Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 1
Overview § 3.1 Introduction § 3.2 Lexical analysis § 3.3 “Top down” parsing § 3.4 “Bottom up” parsing 2
Is w ∈ L(G)? § Recall: given G and w, want to know if w ∈ L(G) § Approach: Find derivation § S ⇒ a ⇒ … ⇒ w Yes § Two principal approaches § Start with S (Start symbol), work towards w § Guess what production will lead to w § “Top-down” parsing S ⇒ … … ⇒ w § Start with w and try to find a way to get back to S § Guess how w was generated § “Bottom-up” parsing w ⇐ … ⇐ … ⇐ S 3
3.3 “Top down” parsing § Given w ∈ T* and context-free grammar G(S, T, NT, P) is w ∈ L(G)? § Top-down: find a derivation S ⇒ … ⇒ w § Want to find a left-most derivation § Process input from left-to-right § Languages described by a context-free grammar can be recognized by a stack machine § w recognized ⇔ w ∈ L(G) § Get derivation for free (sequence of actions by stack machine) 4
Simple stack machine input string a + b $ ( $ is the end of input marker) ip TOS sp Parser control $ 5
Actions § Error § w ∉ L(G) § Accept § w ∈ L(G) § Match § Consume: Remove from input , advance input pointer § Pop stack § Reduction § Use production to expand/contract the top of the stack 6
Parser decisions § Parser must decide based on top of stack and current input § Current input § Either the next token § Or some number k of remaining tokens 7
Grammar G 7 § Start symbol S § Terminals : { Id, +, -, *, / } § Non-terminals: {S, Op} § Productions S à Id Op Id | (1) - Id (2) Op à + | (3) - | (4) * | (5) / (6) 8
11
12
Parser decisions § Parser must decide based on top of stack and current input § Current input § Either the next token § Or some number k of remaining tokens § How can we control the parser? § Must be sure that w ∉ L(G) if we say there is no derivation 18
Grammars & words § Words are finite § Grammars are finite § Finite alphabets § Finite number of productions § Try until you succeed 19
Compiler Design Spring 2018 3.3.1 Backtracking parsers Thomas R. Gross Computer Science Department ETH Zurich, Switzerland 20
Backtracking parsers § Basic idea (given grammar G, word w) § Start with S § Given state of stack, rest of input § Can we match, consume & pop a symbol § Yes: Do it § No: Can we apply a production to non-terminal X on top of stack? § Yes: Do it. § No: Stuck, continue with undo § Undo: Undo last step and try another production § Either for X, or (if there are no choices left) § For non-terminal that was replaced in previous step § May have to restore input 21
Consider this grammar G 8 § Start symbol: S § Terminals: {a, b, x} § Non-terminals: {A, B, S} § Productions: S à x A | x B A à x A | a B à x B | b § What is L(G)? 22
Consider this grammar G 8 § Start symbol: S § Terminals: {a, b, x} § Non-terminals: {A, B, S} § Productions: S à x A | x B A à x A | a B à x B | b § L(G) = { x n a, x n b | n > 0 } 23
xxxb Stack Input Action S $ x x x b S à xA Match x A $ x x x b A $ x x b S à xA Match x A $ x x b A $ x b S à xA Match x A $ x b A $ B Undo Undo x A $ x b … … … S à xB S $ x x x b x B $ x x x b Match S à xB B $ x x b … 26
Backtracking § Accept if stack is empty and all input consumed § Reject if there are no more choices to try § Signal error § Implementation easy § May not be efficient – but fast enough in some settings § Can be used for any language 30
3.3.2 Predictive top-down parsers § For some grammars the first k tokens of the unprocessed input determine the parser’s action § LL(k) grammars § Left-to-right scan, left-most derivation, k symbols of look-ahead § Important subclass: LL(1) § Many programming languages have LL(1) grammars § Predictive parsing: The next k symbols determine everything 31
Example: One token lookahead § Example production stmt → if expr then stmt else stmt | while expr do stmt | begin stmt end § Guess (which production will lead to w ) is possible by looking at first token 32
Consider G 8 (again) § Start symbol: S § Terminals: {a, b, x}, Non-terminals: {A, B, S} § Productions: S à x A | x B A à x A | a B à x B | b Can we use predictive parsing for this grammar? Please justify your answer. You can work in teams. Bored? How can we use a predictive parser for L(G)? 34
39
3.3.3 Construction of predictive parsers § Top-down § Predictive: For any combination of (top-of-stack, input) parser knows how to move forward § Towards an “accept” or “reject” decision § Look again at stack machine 43
Simple stack machine input string a + b $ ( $ is the end of input marker) ip TOS sp Predictive parser control $ 44
Simple stack machine input string a + b $ ( $ is the end of input marker) ip TOS sp Predictive parser control Parsing table Contains rules M M[NT, T] = production NT à a $ 45
Predictive parser § Two parts 1. (Generic) controller 2. (Grammar-specific) parsing table M § Start with S (start symbol on the stack) § Expand § Pop matching terminals … until stack is empty § Fine print § Assume context-free grammar G(S, T, NT, P) § Add $ to mark bottom of stack, end of input § Goal: Find left-most derivation 46
Part #1: Parser control repeat { X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ }; } until (X == $) and (*ip == $); 48
Slow motion : Match, consume, pop § Grammar G 11 with productions § Input w = a b S à AB A à a § Assume we have this B à b intermediate state: a $ a b $ ip 50
Slow motion : Match, consume, pop § Grammar G 11 with productions § Input w = a b S à AB A à a § Assume we have this B à b intermediate state: a $ a b $ b $ $ ip
Part #2: Parsing table M § Controls specific operation steps of parsing engine § Specific: for a grammar § Decides what to do if there is a non-terminal on top of the stack Pick a production § Expand non-terminal using production § 52
Part #2: Parsing table M § (Again) grammar G 11 with productions S à AB A à a B à b Input (terminal) symbol a b Non-terminal S A B
Part #2: Parsing table M § (Again) grammar G 11 with productions S à AB A à a B à b Input (terminal) symbol a b Non-terminal S S à AB A A à a B B à b
Part #2: Parsing table M § (Again) grammar with productions S à AB A à a B à b Input (terminal) symbol a b $ Non-terminal S S à AB A A à a B B à b $ ACCEPT § No entry: Error
Part #2: Parsing table M § (Again) grammar with productions S à AB A à a B à b Input (terminal) symbol a b $ Non-terminal S Error Error S à AB A A à a Error Error B Error B à b Error $ Error Error ACCEPT § No entry: Error
Part #1 (parser control) revisited repeat { X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ }; } until (X == $) and (*ip == $); 57
Part #1 (parser control) revisited repeat { X = top of stack a: terminal pointed to by ip (input pointer) if (X ∈ T) { if (X == a) { pop X; ip++ }; else error(); } else if (M[X, a] is error-entry) error(); else if (M=[X, a] == X à Y 1 Y 2 … Y n ) { pop X push Y n … Y 2 Y 1 onto the stack record production X à Y 1 Y 2 … Y n } until (X == $) and (*ip == $); 58
Slow motion Input string: w = a b S $ a b $ § S à AB A B $ a b $ § A à a a B $ a b $ § match, consume, pop B $ b $ § B à b 62
Slow motion Input string: w = a b B $ b $ § B à b b $ b $ § match, consume, pop $ $ § ACCEPT 63
Construction of parsing control table M § Table M [top-of-stack, next-input] constructed from grammar productions § Each entry contains one of the following § A production § Error § Accept § The grammar for such a table cannot be ambiguous § M defined ⇒ grammar not ambiguous 65
Grammar G 12 for expressions § Start symbol: E § Terminals: T = { ( , ) , * , + , Id } § Non-terminals: NT NT = { E, E’, F, T, T’ } § Productions E à T E’ (1) E’ à + T E’ | (2) e (3) T à F T’ (4) T’ à * F T’| (5) e (6) F à ( E ) | (7) 66 Id (8)
L(G) § Arithmetic expressions § Not ambiguous 67
Setting M § Need to capture legal input for all non-terminals § Legal input for X: those strings that start a derivation from X § X ⇒ * s a with s ∊ T +, a ∊ { T ∪ NT }* § M [ X, r ] § X on top of stack § r start of (remaining) input w: use production § r not start of (remaining) input w: X ⇏ * r a so error! 68
Examples for G 12 § Legal input for the following non-terminals § F → ? § T → ? § “)” not OK if either F or T is on top of the stack 70
X on top of stack, input t § X ⇒ t § t ∈ T* T* § Need start of words w over T* T* that can be generated from X § How much of the words w do we want to look at? § For now: just 1 symbol (character) § Different productions P 1 : X à a , P 2 : X à b , … § P 1 : Set 1 (of terminals) § P 2 : Set 2 (of terminals) § … 73 § Put first symbol of w into Set i
Recommend
More recommend