sablecc sablecc
play

SableCC SableCC The output is: a LALR(1) parser for the defined - PowerPoint PPT Presentation

The SableCC Tool The SableCC Tool The input is: a sequence of token definitions Compilation 2007 Compilation 2007 a context-free grammar SableCC SableCC The output is: a LALR(1) parser for the defined language


  1. The SableCC Tool The SableCC Tool � The input is: • a sequence of token definitions Compilation 2007 Compilation 2007 • a context-free grammar SableCC SableCC � The output is: • a LALR(1) parser for the defined language • available as a Java class Michael I. Schwartzbach BRICS, University of Aarhus SableCC 2 Our Favorite Grammar in SableCC Generated Classes Our Favorite Grammar in SableCC Generated Classes Helpers Productions tab = 9; start = {plus} start plus term | drwxr-xr-x 2 mis users 4096 Sep 7 09:28 analysis/ cr = 13; {minus} start minus term | drwxr-xr-x 2 mis users 4096 Sep 7 09:28 lexer/ lf = 10; {term} term; term = {mult} term star factor | drwxr-xr-x 2 mis users 4096 Sep 7 09:32 node/ Tokens {div} term slash factor | drwxr-xr-x 2 mis users 4096 Sep 7 09:28 parser/ eol = cr | lf | cr lf; {factor} factor; blank = ' ' | tab; factor = {id} id | -rw-r--r-- 1 mis users 536 Sep 7 09:32 xyz.sablecc star = '*'; {paren} lpar start rpar; slash = '/'; plus = '+'; We never need to look at this output minus = '-'; lpar = '('; rpar = ')'; id = 'x' | 'y' | 'z'; Ignored Tokens blank,eol; SableCC 3 SableCC 4 1

  2. The Main Application An Ambiguous Grammar The Main Application An Ambiguous Grammar import parser.*; import lexer.*; import node.*; X → Λ | a X | a a a a X import java.io.*; class Main { public static void main(String args[]) { try { Any string in this language has exponentially Parser p = many different parse trees new Parser ( new Lexer ( new PushbackReader(new InputStreamReader(System.in)))); Start tree = p.parse(); /* parse the input */ } catch(Exception e) { a a . . . a a a . . . a has exactly Fib(n) parse trees System.out.println(e); } } n } SableCC 5 SableCC 6 The SableCC Version SableCC is Unhappy The SableCC Version SableCC is Unhappy Tokens reduce/reduce conflict in state [stack: TA TA PX *] on EOF in { a = 'a'; [ PX = TA PX * ] followed by EOF (reduce), [ PX = TA TA PX * ] followed by EOF (reduce) Productions } x = {empty} | {one} a x | The LALR(1) table contains conflicting actions {two} [first]:a [second]:a x; � Note that all symbols must have unique names � The default name for foo is [foo]: SableCC 7 SableCC 8 2

  3. Solution: Less Stupid Grammar A Grammar for If- -Statements Statements Solution: Less Stupid Grammar A Grammar for If Tokens Tokens eol = cr | lf | cr lf; a = 'a'; blank = ' ' | tab; exp = 'exp'; Productions if = 'if'; x = {empty} | then = 'then'; else = 'else'; {one} a x ; assign = 'assign'; Ignored Tokens blank,eol; Productions stm = {one} if exp then stm | {both} if exp then [thenbranch]:stm else [elsebranch]:stm | {assign} assign; SableCC 9 SableCC 10 SableCC is Unhappy Solution: Less Natural Grammar SableCC is Unhappy Solution: Less Natural Grammar shift/reduce conflict in state [stack: TIf TExp TThen PStm *] Productions on TElse in { stm = {one} if exp then stm | [ PStm = TIf TExp TThen PStm * TElse PStm ] (shift), {both} if exp then [thenbranch]:stm2 else [elsebranch]:stm | [ PStm = TIf TExp TThen PStm * ] followed by TElse (reduce) {assign} assign; } stm2 = {both} if exp then [thenbranch]:stm2 else [elsebranch]:stm2 | {assign} assign; But the grammar does not appear to be stupid... SableCC 11 SableCC 12 3

  4. Dangling Else Problem The Palindrome Grammar Dangling Else Problem The Palindrome Grammar � An example statement: Tokens zero = '0'; one = '1'; if exp then if exp then assign else assign Productions pal = {empty} | � To which if does the else belong? {one} one | {zero} zero | � The first grammar is ambiguous {oneone} [first]:one pal [second]:one | � Our modified grammar parses the string as: {zerozero} [first]:zero pal [second]:zero; ( ) if exp then if exp then assign else assign SableCC 13 SableCC 14 SableCC is Unhappy No Solution! SableCC is Unhappy No Solution! shift/reduce conflict in state [stack: TZero *] on TZero in { � There is no LALR(1) grammar for this language [ PPal = * TZero PPal TZero ] (shift), [ PPal = * TZero ] (shift), [ PPal = * ] followed by TZero (reduce), [ PPal = TZero * ] followed by TZero (reduce) � Some grammars are not LALR(1) } � And some languages are not LALR(1) shift/reduce conflict in state [stack: TZero *] on TZero in { [ PPal = * TZero PPal TZero ] (shift), [ PPal = * TZero ] (shift), [ PPal = * ] followed by TZero (reduce), � Some grammars are ambiguous [ PPal = TZero * ] followed by TZero (reduce) } � And some languages are ambiguous shift/reduce conflict in state [stack: TZero *] on TOne in { [ PPal = * TOne PPal TOne ] (shift), [ PPal = * TOne ] (shift), [ PPal = TZero * ] followed by TOne (reduce) } SableCC 15 SableCC 16 4

  5. Language Containments EBNF Features Language Containments EBNF Features SableCC allows right-hand side abbreviations: � Optional: x = y? Context-Free { a i b j c k | i=j or j=k } � List: x = y* � Non-empty list: x = y+ Unambiguous LALR(1) This has many benefits: � shorter � less error-prone � fewer names must be invented palindromes SableCC 17 SableCC 18 EBNF Example EBNF Expansion EBNF Example EBNF Expansion � x = y? block = lbrace decl* stm+ rbrace ; decl = type id init? semicolon ; init = equals exp; x = {some} y | {none} ; � x = y* x = {zero} | {more} y x ; � x = y+ x = {one} y | {more} y x ; SableCC 19 SableCC 20 5

More recommend