Signatures and grammars Signatures and grammars Why manual disambiguation in SDF? Associativity of binary operators y g y y Fully declarative definition of syntax Priorities between binary operators Implicit disambiguation

  1. Signatures and grammars Signatures and grammars • Why manual disambiguation in SDF? • Associativity of binary operators y g y y • Fully declarative definition of syntax • Priorities between binary operators • Implicit disambiguation can be wrong • Transitivity of priorities • More programming languages can be parsed • More programming languages can be parsed • Prefix expressions and binary expressions • Modularity • Postfix expression with binary operator (slide 3+4) • Fewer non-terminals/sorts • Overloaded comma’s in expression languages (slide Overloaded comma s in expression languages (slide • Separation of concerns: form of rules is independent of 5+6) disambiguation • Costs? • Intellectual effort • False safety, grammar may still be ambiguous / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 0 20-9-2011 PAGE 1 Signatures and grammars Signatures and grammars • The "< >" notation is used to module Expressions module Expressions exports t exports restrict the filtering behavior sorts E of priorities to certain sorts E lexical syntax arguments. lexical syntax [\ \t\n] > LAYOUT [\ \t\n] -> LAYOUT • In this case, if a "+" is a direct [\ \t\n] -> LAYOUT context-free start-symbols E child of the first "E" in the "[ context-free start-symbols E context-free syntax ]" production, it is filtered. context-free syntax "e" -> E • No other direct children are "e" -> E filtered but the ones listed context-free priorities E "[" E "]" -> E between the angular brackets. E "[" E "]" -> E <0> > E "+" E -> E {left} E "+" E -> E {left} E "+" E > E {l ft} Input sentence: e + e [ e + e ] / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 2 / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 3

  2. Signatures and grammars Signatures and grammars module Expressions exports sorts E lexical syntax [\ \t\n] -> LAYOUT context-free start-symbols E context-free syntax "e" -> E "f" "(" {E ","}+ ")" -> E E "," E -> E {left} Input sentence: f(e,e) / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 4 20-9-2011 PAGE 5 Signatures and grammars Signatures and grammars • The priority definition reveals • Longest match module Expressions g exports exports some of the implementation • Runaway whitespace and comments sorts E details of SDF, by showing lexical syntax • Syntactic overloading between identifiers and one of the productions that [\ \t\n] -> LAYOUT are automatically generated are automatically generated context-free start-symbols E t t f t t b l E keywords k d context-free syntax for you. • Useless cloning of identifier classes "e" -> E • There is no way around this, "f" "(" {E ","}+ ")" -> E • Dynamically reserved types • Dynamically reserved types unless you would like to unless you would like to context-free priorities remove the comma separated • Dangling else and related ambiguities { non-assoc: E -> {E “,”}+ E "," E -> E {left} argument list and use the } binary comma operator binary comma operator instead for parsing your commas. / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 6 / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 7

  3. Signatures and grammars Signatures and grammars • Runaway whitespace and comments y module Layout exports exports sorts P S context-free start-symbols P lexical syntax lexical syntax [\ \t\n] -> LAYOUT context-free syntax context-free syntax "[" S* "]" -> P "s" -> S / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 8 20-9-2011 PAGE 9 Signatures and grammars Signatures and grammars • These trees have typical symptoms: y y module Layout exports • a nullable symbol (for example a star list), which sorts P S recognizes the empty language context-free start-symbols P • two layout nodes surrounding the nullable non-terminal • two layout nodes surrounding the nullable non terminal lexical syntax take turns in accepting the layout. [\ \t\n] -> LAYOUT "%" ~[\%]* "%" -> LAYOUT context-free restrictions LAYOUT? -/- [\ \t\n] LAYOUT? -/- [\%] context-free syntax "[" S* "]" -> P "s" -> S / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 10 / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 11

  4. Signatures and grammars Signatures and grammars • Syntactic overloading between identifiers and keywords module Keywords exports sorts S E context-free start-symbols S lexical syntax [A-Za-z][A-Za-z0-9]* -> E [\ \t\n] -> LAYOUT context-free syntax E "(" {E ","}* ")" ";" -> S "return" E ";" -> S "(" E ")" -> E {bracket} / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 12 20-9-2011 PAGE 13 Signatures and grammars Signatures and grammars • The symptoms of this ambiguity are: y g y module Keywords exports • The top two nodes of the ambiguity differ sorts S E • There is more than one instance of syntactic context-free start-symbols S overloading: overloading: lexical syntax − "return" keyword, and the bracket syntax "( )" infer [A-Za-z][A-Za-z0-9]* -> E with other parts of the syntax [\ \t\n] -> LAYOUT − ";" is allocated to a different production. "return" -> E {reject} context-free syntax • Characters that are grouped under a keyword node E "(" {E ","}* ")" ";" -> S ("return") in one alternative, end up under an identifier ( return ) in one alternative, end up under an identifier "return" E ";" -> S node in another ("E"). "(" E ")" -> E {bracket} / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 14 / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 15

  5. Signatures and grammars Signatures and grammars • Useless cloning of identifier classes module Clone exports sorts Identifier ClassName TypeName Declaration context-free start-symbols Declaration lexical syntax [\ \t\n] -> LAYOUT [a-z]+ -> Identifier [a-z]+ -> ClassName context-free syntax Identifier -> TypeName ClassName -> TypeName TypeName Identifier ";" -> Declaration / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 16 20-9-2011 PAGE 17 Signatures and grammars Signatures and grammars • Symptoms are very clear: • Useless cloning of identifier classes y y module Clone • The two top productions of the ambiguity are different exports • The two top productions of the ambiguity are injections sorts Identifier Declaration (chain rules) (chain rules) context-free start-symbols Declaration • Each chain rule introduces a lexical (identifier) that has lexical syntax the same definition, but a different non-terminal name [\ \t\n] -> LAYOUT • Each alternative recognizes exactly the same [a-z]+ -> Identifier characters in exactly the same way, modulo non- context-free syntax terminal names Identifier Identifier ";" -> Declaration / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 18 / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 19

  6. Signatures and grammars Signatures and grammars • Dangling else and related ambiguities module DanglingElse exports sorts S E lexical syntax [\ \t\n] -> LAYOUT context-free restrictions LAYOUT? -/- [\t\n\ ] context-free start-symbols S context-free syntax "expr" -> E "if" E "then" S+ -> S "if" E "then" S+ "else" S+ -> S "other" -> S / Faculteit Wiskunde en Informatica / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 20 20-9-2011 PAGE 21 Signatures and grammars Signatures and grammars • Symptoms in these two trees: • Dangling else and related ambiguities y module DanglingElse • Top nodes of the derivations have different productions exports (this is not always the case in this kind of ambiguity) sorts S E • Both trees exercise the same production rules, but in • Both trees exercise the same production rules but in lexical syntax different vertical order [\ \t\n] -> LAYOUT • One of the productions is a prefix of the other, context-free restrictions • and, these two productions are the ones that have LAYOUT? -/- [\t\n\ ] swapped vertical order between the two derivations context-free start-symbols S context-free syntax • In both derivations, the deeply nested statement is the In both derivations, the deeply nested statement is the "expr" -> E only, or the last, statement of a list of statements "if" E "then" S+ -> S {avoid} "if" E "then" S+ "else" S+ -> S "other" -> S / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 22 / Faculteit Wiskunde en Informatica 20-9-2011 PAGE 23

