top down syntax analysis
play

Top-down Syntax Analysis Sebastian Hack (based on slides by - PowerPoint PPT Presentation

Top-down Syntax Analysis Sebastian Hack (based on slides by Reinhard Wilhelm and Mooly Sagiv) http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University Top-Down Syntax Analysis input: A sequence of symbols


  1. Top-down Syntax Analysis Sebastian Hack (based on slides by Reinhard Wilhelm and Mooly Sagiv) http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University

  2. Top-Down Syntax Analysis input: A sequence of symbols (tokens) output: A syntax tree or an error message • Read input from left to right • Construct the syntax tree in a top-down manner starting with a node labeled with the start symbol • until input accepted (or error) do • Predict expansion for the actual leftmost nonterminal (maybe using some lookahead into the remaining input) or • Verify predicted terminal symbol against next symbol of the remaining input • Finds leftmost derivations 1

  3. Grammar for Arithmetic Expressions Left factored grammar G 2 , i.e. left recursion removed. S → E E → TE ′ E generates T with a continuation E ′ E ′ → + E | ǫ E ′ generates possibly empty sequence of + T s T → FT ′ T generates F with a continuation T ′ T ′ → ∗ T | ǫ T ′ generates possibly empty sequence of ∗ F s F → id | ( E ) G 2 defines the same language as G 0 und G 1 . 2

  4. Grammar for Arithmetic Expressions Left factored grammar G 2 , i.e. left recursion removed. S → E E → TE ′ E generates T with a continuation E ′ E ′ → + E | ǫ E ′ generates possibly empty sequence of + T s T → FT ′ T generates F with a continuation T ′ T ′ → ∗ T | ǫ T ′ generates possibly empty sequence of ∗ F s F → id | ( E ) G 2 defines the same language as G 0 und G 1 . But the parse tree is not so suitable as an abstract syntax tree! 2

  5. Recursive Descent Parsing • parser is a program, • a procedure X for each non-terminal X , • parses words for non-terminal X , • starts with the first symbol read (into variable nextsym ), • ends with the following symbol read (into variable nextsym ). • uses one symbol lookahead into the remaining input. • uses the FiFo sets to make the expansion transitions deterministic FiFo ( N → α ) = FIRST 1 ( α ) ⊕ 1 FOLLOW 1 ( N ) 3

  6. The FIRST 1 Sets • A production N → α is applicable for symbols that “begin” α • Example: Arithmetic Expressions, Grammar G 2 • The production F → id is applied when the current symbol is id • The production F → ( E ) is applied when the current symbol is ( • The production T → F is applied when the current symbol is id or ( • Formal definition: ∗ ⇒ w , w ∈ V ∗ FIRST 1 ( α ) = { 1 : w | α = T } 4

  7. The FOLLOW 1 Sets • A production N → ǫ is applicable for symbols that “can follow” N in some derivation • Example: Arithmetic Expressions, Grammar G 2 • The production E ′ → ǫ is applied for symbols # and ) • The production T ′ → ǫ is applied for symbols # , ) and + • Formal definition: ∗ FOLLOW 1 ( N ) = { a ∈ V T | ∃ α, γ : S ⇒ α Na γ } = 5

  8. Definitions Let k ≥ 1 • k -prefix of a word w = a 1 . . . a n  a 1 . . . a n if n ≤ k  k : w =  a 1 . . . a k otherwise • k -concatenation ⊕ k : V ∗ × V ∗ → V ≤ k , defined by u ⊕ k v = k : uv • extended to languages k : L = { k : w | w ∈ L } L 1 ⊕ k L 2 = { x ⊕ k y | x ∈ L 1 , y ∈ L 2 } k V ≤ k = � V i set of words of length at most k i =1 6

  9. FIRST k and FOLLOW k X ∈ FIRST k ( X ) ∈ FOLLOW k ( X ) • set of k –prefixes of terminal words for α FIRST k : ( V N ∪ V T ) ∗ → 2 V ≤ k T ∗ FIRST k ( α ) = { k : u | α = ⇒ u } • set of k –prefixes of terminal words that may immediately follow X FOLLOW k : V N → 2 V ≤ k T # 7 ∗ FOLLOW k ( X ) = { w | S = ⇒ β X γ and w ∈ FIRST k ( γ ) }

  10. Parser for G 2 program parser; var nextsym: string ; proc scan; { reads next input symbol into nextsym} proc error (message: string ); { issues error message and stops parser } proc accept; { terminates successfully } proc S; begin E end ; proc E; begin T; E’ end ; 8

  11. proc E’; begin case nextsym in { ”+” } : if nextsym = "+ " then scan else error( "+ expected") fi ; E; otherwise ; endcase end ; proc T; begin F; T’ end ; proc T’; begin case nextsym in { ” ∗ ” } : if nextsym = "*" then scan else error( "* expected") fi ; T; otherwise ; 9 endcase

  12. proc F; begin case nextsym in { ”(” } : if nextsym = "(" then scan else error( "( expected") fi ; E; if nextsym = ”)” then scan else error(" ) expected") fi ; otherwise if nextsym = ”id” then scan else error("id expected") fi ; endcase end ; begin scan; S; if nextsym = ”#” then accept else error(" # expected") fi end . 10

  13. How to Construct such a Parser Program • Code was automatically generated from the grammar and the FiFo sets. • The program generating the parser has the functions: V N → code N_prog : nonterminals ( V N ∪ V T ) ∗ → code C_prog : concantenations S_prog : V N ∪ V T → code symbols 11

  14. Parser Schema program parser; var nextsym: symbol; proc scan; ( ∗ reads next input symbol into nextsym ∗ ) proc error (message: string ); ( ∗ issues error message and stops the parser ∗ ) proc accept; ( ∗ terminates parser successfully ∗ ) N_prog( X 0 ); (* X 0 start symbol *) N_prog( X 1 ); . . . N_prog( X n ); 12

  15. begin scan; X 0 ; if nextsym = ”#” then accept else error(". . . ") fi end 13

  16. The Non-terminal Procedures N = Non-terminal, C = Concatenation, S = Symbol N_prog( X ) = (* X → α 1 | α 2 | · · · | α k − 1 | α k *) proc X; begin case nextsym in FiFo( X → α 1 ) : C_progr( α 1 ); FiFo( X → α 2 ) : C_progr( α 2 ); . . . FiFo( X → α k − 1 ) : C_progr( α k − 1 ); otherwise C_progr( α k ); endcase end ; 14

  17. C_progr( α 1 α 2 · · · α k ) = S_progr( α 1 ); S_progr( α 2 ); . . . S_progr( α k ); S_progr( a ) = if nextsym = a then scan else error ( "a expected") fi S_progr( Y ) = Y FiFo–sets have to be disjoint (LL(1)–grammar) 15

  18. A Generative Solution Generate the control of a deterministic PDA from the grammar and the FiFo sets. • At compiler–generation time construct a table M M : V N × V T → P M [ N , a ] is the production used to expand nonterminal N when the current symbol is a • For some grammars report that the table cannot be constructed. The compiler writer can then decide to: • change the grammar (but not the language) • use a more general parser-generator • “Patch” the table (manually or using some rules) 16

  19. Creating the table Input: cfg G , FIRST 1 und FOLLOW 1 for G . Output: The parsing table M or an indication that such a table cannot be constructed M is constructed as follows: • For all X → α ∈ P and a ∈ FIRST 1 ( α ), set M [ X , a ] = ( X → α ) • If ε ∈ FIRST 1 ( α ), for all b ∈ FOLLOW 1 ( X ), set M [ X , b ] = ( X → α ) • Set all other entries of M to error Parser table cannot be constructed if at least one entry is set twice. Then, G is not LL(1) 17

  20. Example – arithmetic expressions nonterminal symbol Production S ( , id S → E S + , ∗ , ) , # error E → TE ′ E ( , id E + , ∗ , ) , # error E ′ → + E E ′ + E ′ → ǫ E ′ ) , # E ′ ( , ∗ , id error T → FT ′ ( , id T + , ∗ , ) , # T error T ′ → ∗ T T ′ ∗ T ′ → ǫ T ′ + , ) , # T ′ ( , id error F id F → id ( F → ( E ) F + , ∗ , ) F error 18

  21. LL-Parser Driver (interprets the table M ) program parser; var nextsym: symbol; var st: stack of item; proc scan; ( ∗ reads next input symbol into nextsym ∗ ) proc error (message: string ); ( ∗ issues error message and stops the parser ∗ ) proc accept; ( ∗ terminates parser successfully ∗ ) proc reduce; ( ∗ replaces [ X → β. Y γ ][ Y → α. ] by [ X → β Y .γ ] ∗ ) proc pop; ( ∗ removes topmost item from st ∗ ) proc push ( i : item); ( ∗ pushes i onto st ∗ ) proc replaceby ( i: item); ( ∗ replaces topmost item of st by i ∗ ) 19

  22. begin scan; push( [ S ′ → . S ] ); while nextsym � = "#" do case top in [ X → β. a γ ]: if nextsym = a then scan; replaceby([ X → β a .γ ]) else error fi ; [ X → β. Y γ ] : if M [ Y , nextsym ] = ( Y → α ) then push([ Y → .α ]) else error fi ; [ X → α. ]: reduce; [ S ′ → S . ] : if nextsym = "#" then accept else error fi endcase od end . 20

  23. Explicit Stack Deterministic Pushdown Automaton w a v ✻ [ X → α. Y β ] Input tree Output ❄ ρ M Parser–Table Control # Stack 21

  24. LL( k ) Grammar Goal: formalizing our intuition when the expand-transitions of the Item-Pushdown-Automaton can be made deterministic. Means: k -symbol lookahead into the remaining input. 22

  25. LL( k ) Grammar • Let G = ( V N , V T , P , S ) be a cfg and k be a natural number. G is an LL( k ) grammar iff the following holds: if there exist two leftmost derivations ∗ ∗ = lm uY α = ⇒ ⇒ = ⇒ S lm u βα lm ux and ∗ ∗ ⇒ ⇒ ⇒ S = lm uY α = lm u γα = lm uy and if k : x = k : y , then β = γ . • The expansion of the leftmost non-terminal is always uniquely determined by • the consumed part of the input and • the next k symbols of the remaining input 23

  26. Example 1 Let G 1 be the cfg with the productions STAT → if id then STAT else STAT fi | while id do STAT od | begin STAT end | id := id 24

Recommend


More recommend