grammars and trees
play

Grammars and Trees Dr. Vadim Zaytsev aka @grammarware 2015 Recap - PowerPoint PPT Presentation

Grammars and Trees Dr. Vadim Zaytsev aka @grammarware 2015 Recap Lexical analysis Syntactic analysis Semantic analysis Intermediate representation Code generation Optimisation . . . WHY Formats everywhere


  1. Grammars and Trees Dr. Vadim Zaytsev aka @grammarware 2015

  2. Recap ✓ Lexical analysis ✓ Syntactic analysis ✓ Semantic analysis ✓ Intermediate representation ✓ Code generation ✓ Optimisation ✓ . . .

  3. WHY ✓ Formats everywhere ✓ DSLs are easy ✓ SLs have many faces ✓ 90% automated, 10% hard work

  4. Models of Languages ✓ How can a language be defined?

  5. Models of Languages ✓ Actual (in)finite set ✓ {“a”, “b”, “c”} ✓ {0 ⁱ 1 ⁿ …} ✓ English ✓ set arithmetic works ✓ concatenation, union, difference, intersection, complement, closure

  6. Models of Languages ✓ Formal grammar ✓ term rewriting system ✓ “semi-Thue” ✓ all about rewriting rules ✓ α → β

  7. Models of Languages ✓ Recognising automaton ✓ states ✓ transitions ✓ extra stuff

  8. Models of Languages ✓ Declarative ✓ enumeration / description ✓ characteristic function ✓ Analytic ✓ recogniser / parser ✓ analytic grammar ✓ Generative ✓ term rewriting system ✓ generative grammar

  9. Language instance of Program

  10. Language modelled by y m b o d d e e l l l l e e d d b o y m Automaton Sentences Grammar Program

  11. Language modelled by y m b o d d e e l l l l e e d d b o y m accepts generates Automaton Sentences Grammar Program

  12. Language modelled by y m b o d d e e l l l l e e d d b o y m accepts generates Automaton Sentences Grammar p element of a o r t s e s a m b r l o e f n b o y c Program

  13. defined by defined by Language Grammar Grammar conforms to conforms to Program Program

  14. defined by defined by Language Grammar defined by Grammar conforms to conforms to Program Program

  15. Example: XML ✓ X ::= ![<>]+ | '<' ![>]+ '>' X* '<' '/' ![>]+ '>' ✓ X ::= D | '<' T A* '>' X* '<' '/' T '>' ✓ <!ELEMENT dir (#PCDATA)> <!ATTLIST dir xml:space (def|preserve) 'preserve'> ✓ <xsd:element name="tag"> <xsd:complexType> . . .

  16. Conclusion ✓ “Language” is intangible ✓ Grammars hide in: ✓ data types ✓ API and libraries ✓ protocols and formats ✓ structural commitments ✓ . . . ✓ Not all grammars are equally “good”

  17. Rose by Arwen Grune; p.58 of Grune/Jacobs’ “Parsing Techniques”, 2008

  18. Duncan Rawlinson, Chomsky.jpg, 2004, CC-BY. Unrestricted grammars α → β Context-sensitive grammars α X β → α γ β Context-free grammars X → γ Noam Chomsky X → a Regular grammars (b.1928) X → a B Noam Chomsky. On Certain Formal Properties of Grammars , Information & Control 2(2):137–167, 1959.

  19. Duncan Rawlinson, Chomsky.jpg, 2004, CC-BY. Unrestricted grammars α → β Decidable grammars Context-sensitive grammars α X β → α γ β Indexed grammars Context-free grammars A [ σ ] → α [ σ ] A [ σ ] → B [ f σ ] X → γ A [ f σ ] → α [ σ ] Deterministic CFG Nested word Noam Chomsky X → a Regular grammars (b.1928) X → a B Non-recursive grammars Noam Chomsky. On Certain Formal Properties of Grammars , Information & Control 2(2):137–167, 1959.

  20. Recursively enumerable Unrestricted grammars Turing machine languages Decidable grammars Recursive languages Terminating automata Context-sensitive Context-sensitive Linear-bounded automata grammars languages Indexed grammars Languages with macros Nested stack automata Context-free grammars Context-free languages Pushdown automata Deterministic CFG Deterministic CFL Deterministic PDA Nested word Nested word Visibly PDA Regular grammars Regular languages FSMs Non-recursive grammars Finite languages FSMs without cycles

  21. Finite languages ✓ Examples: ✓ Boolean values ✓ languages ✓ countries ✓ cities ✓ postcodes

  22. Regular languages ✓ Regular sets by Stephen Kleene in 1956 ✓ ∅ , ε , letters from Σ ✓ concatenation ✓ iteration ✓ alternation ✓ Precisely fit the Stephen Cole Kleene regular class (1909–1994) S. C. Kleene, Representation of Events in Nerve Nets and Finite Automata . In Automata Studies , pp. 3–42, 1956. photo from: Konrad Jacobs, S. C. Kleene, 1978, MFO.

  23. Regular languages ✓ PCRE ✓ “Perl-compatible 
 regular expressions” ✓ (not compatible with Perl) ✓ (not regular) ✓ C library ✓ (backrefs, recursion, assertions…)

  24. Context-free ✓ FSM + memory (stack) ✓ Modular composition ✓ A ::= “[” B “]” ; ✓ B ::= A? ; ✓ Forget intersection & diff ✓ Closed under substitution John Backus (1924–2007)

  25. Context-sensitive ✓ Explainable only in context ✓ Sentence → List End ✓ List → Name; ✓ List → List “,” Name; ✓ “,” Name End → “and” Name ✓ Parsing in exponential time

  26. Unbounded ✓ (almost) anything ✓ recognising is impossible ✓ parsing is impossible

  27. Which is which? ✓ Substring search ✓ grep, contains(), find(), substring(), … ✓ Substring replacement ✓ sed, awk, perl, vim, replace(), replaceAll(), … ✓ Pretty-printing ✓ VS.NET, Sublime, TextMate, …

  28. Which is which? ✓ Counting [non-empty] lines in a file ✓ wc -l, grep -c “” ✓ grep -v “^$”, sed -n /./p | wc -l ✓ Parsing HTML ✓ <BODY><TABLE><P><A HREF=… ✓ Parsing a postcode ✓ 1098 XG, …

  29. Popular languages ✓ {a ⁱ b ⁿ …} ✓ 0 counters ✓ 1 counter ✓ n counters ✓ ∞ counters ✓ Dyck language ✓ parentheses Walther von Dyck (1856–1934) ✓ named parentheses Zeitlupe, https://en.wikipedia.org/wiki/File:Grabstaette_Walther_von_Dyck.jpg, CC-BY-SA, 2012

  30. Popular parsers ✓ Bottom-up ✓ Top-down ✓ Reduce the input back to ✓ Imitate the production the start symbol process by rederivation ✓ Recognise terminals ✓ Each nonterminal is a goal ✓ Replace terminals by ✓ Replace each goal by nonterminals subgoals (= elements of its ✓ Replace terminals and rule) nonterminals by left-hand ✓ Parse tree is built from side of rule top to bottom ✓ LR, LR(0), LR(1), ✓ LL, LL(1), LL(k), LR(k), LALR, SLR, LL(*), GLL, DCG, GLR, SGLR, CYK, … RD, Packrat, Earley

  31. Popular parsers ✓ Bottom-up ✓ Top-down ✓ Reduce the input back to ✓ Imitate the production YACC / bison JavaCC the start symbol process by rederivation ✓ Recognise terminals ✓ Each nonterminal is a goal Beaver ANTLR ✓ Replace terminals by ✓ Replace each goal by nonterminals subgoals (= elements of its SableCC ModelCC ✓ Replace terminals and rule) nonterminals by left-hand ✓ Parse tree is built from side of rule GDK top to bottom Rascal ✓ LR, LR(0), LR(1), ✓ LL, LL(1), LL(k), Tom TXL LR(k), LALR, SLR, LL(*), GLL, DCG, GLR, SGLR, CYK, … RD, Packrat, ASF+SDF Rats! Earley Spoofax PetitParser

  32. Popular data structures ✓ Lists (of tokens) ✓ Trees (hierarchy!) ✓ Forests (many trees) ✓ Graphs (loops!) ✓ Relations (tables)

  33. Conclusion ✓ Parsing recognises structure ✓ Can be many models of a language ✓ Hierarchy of classes ✓ 90% automated, 10% hard work

  34. Lexical syntax ✓ Terminal symbols ✓ finite sublanguage ✓ regular sublanguage ✓ Keywords ✓ Layout ✓ whitespace ✓ comments

  35. Lexical syntax lexical Boolean = "True" | "False"; ✓ Terminal symbols lexical Id = [a-z]+ !>> [a-z]; ✓ finite sublanguage keyword Reserved = "if" | "while"; lexical Id = [a-z]+ \ Reserved !>> [a-z]; ✓ regular sublanguage ✓ Keywords lexical WS = [\ \t\n\r]; ✓ Layout lexical Cm = "--" ... $; ✓ whitespace ✓ comments layout L = (WS|Cm)* 
 !>> [\ \t\n\r] !>> "--";

  36. Lexical syntax XML layout L = [\ \t\n\r]* !>> [\ \t\n\r]; lexical D = ![\<\>]* !>> ![\<\>]; lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; lexical X = D | "\<" T A* "\>" X+ "\<" "/" T "\>";

  37. Beyond lexical XML layout L = [\ \t\n\r]* !>> [\ \t\n\r]; lexical D = ![\<\>]* !>> ![\<\>]; lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; lexical X = D | "\<" T L {A L}* "\>" X+ "\<" "/" T "\>";

  38. Beyond lexical XML layout L = [\ \t\n\r]* !>> [\ \t\n\r]; lexical D = ![\<\>]* !>> ![\<\>]; lexical → syntax lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; lexical X = D | "\<" T L {A L}* "\>" X+ "\<" "/" T "\>";

  39. Beyond lexical XML layout L = [\ \t\n\r]* !>> [\ \t\n\r]; syntax D = W+; lexical W = ![\ \t\n\r\<\>]+ !>> ![\ \t\n\r\<\>]; lexical T = [a-z][a-z0-9]* !>> [a-z0-9]; lexical A = [a-z]+ [=] [\"] ![\"]* [\"]; syntax X = D | "\<" T A* "\>" X* "\<" "/" T "\>";

  40. Recap: lexical ✓ Terminal: "if" ✓ Character class: [a-z] ✓ Inverse: ![a-z] ✓ Kleene closures: [a-z]+, [a-z]* ✓ Optionals: [a-z]? ✓ Reserve: [a-z]+ \ Keywords ✓ Follow: [a-z]+ !>> [a-z]

Recommend


More recommend