parsing
play

Parsing DMS BNF git XCK JS Pascal Inkscape LCF Assembly - PowerPoint PPT Presentation

MetaEnvironment QBasic Eclipse LaTeX BGF PHP M3 SVG LDF jQuery TSR FST CSS JCL BGF DCG XSD C++ PDG XHTML SQL Parsing DMS BNF git XCK JS Pascal Inkscape LCF Assembly Graphviz ksh GWBasic GraphML DHTML Erlang


  1. MetaEnvironment QBasic Eclipse LaTeX Ξ BGF PHP M3 SVG LDF jQuery TSR FST CSS JCL BGF DCG XSD C++ PDG XHTML SQL Parsing DMS BNF git XCK JS Pascal Inkscape LCF Assembly Graphviz ksh GWBasic GraphML DHTML Erlang CodeSurfer ATL Delphi C# yED with Java XLDF Prolog Ruby MediaWiki sh Promela Markdown OS/400 EMF CGI C GIF DITA COBOL HTML dot FPU Python Haskell Flash Grammars 80x86 CRC SPARQL Matlab GrammarLab Wikia Blowfish GDK JSON PCRE Wikidot Turbo Vision Ecore phpbb LCI Smalltalk CVS Rascal Wordpress XBGF EBNF VB Jenkins ASF EDD bibTeX Ada UvA MSc SE: Software Construction 2015 Subversion XSLT WinIce Maple SDF HASP DeGlucker JAXB Django XML SoftIce Grammar Hunter IDA Scheme SPIN Dr. Vadim Zaytsev, Universiteit van Amsterdam Zope Unlambda DTD GRK Grammar Hawk make Perl ANTLR

  2. Grammars & parsing 
 are among the most established areas of CS/SE

  3. N. Chomsky, Syntactic Structures, 
 1957

  4. N. Chomsky, Aspects of the Theory of Syntax, 1965

  5. A.V. Aho & J.D. Ullman, The Theory of Parsing, Translation and Compiling, Volumes I + II, 1972

  6. A.V. Aho, 
 R. Sethi, J.D. Ullman, Compilers: Principles, Techniques and Tools, 1986

  7. D. Grune, 
 C.J.H. Jacobs, Parsing Techniques: A Practical Guide, 2 ed, 2008

  8. Why are grammars and parsing relevant?

  9. Language • Programming languages: C, Java, C#, JavaScript • Markup languages: HTML, XML, TeX, Markdown, wikis • Domain-specific languages: BibTeX, CSS, SQL, QL • Data formats: JSON, log files, protocol data, bytecode • … • (formally: a set of strings)

  10. ����������������������� How to define a language? • List all the sentences! • Infinite languages? • Finite recipes = grammars • Infinite grammars? • Two level grammars

  11. Example • Valid sentences/programs/instances: • Alice • Alice and Bob • Alice, Bob and Coen • Alice, Bob, Coen and Daenerys • … • How to define a recipe?

  12. ABCD Grammar Name → Alice Name → Bob Name → Coen Name → Daenerys Sentence → List End List → Name List → List , Name , Name End → and Name

  13. ABCD Grammar Name → Alice Terminal symbols Name → Bob Name → Coen Name → Daenerys Sentence → List End List → Name List → List , Name , Name End → and Name

  14. ABCD Grammar Name → Alice Terminal symbols Name → Bob Name → Coen Nonterminal symbols Name → Daenerys Sentence → List End List → Name List → List , Name , Name End → and Name

  15. ABCD Grammar Name → Alice Terminal symbols Name → Bob Name → Coen Nonterminal symbols Name → Daenerys Sentence → List End Starting symbol List → Name List → List , Name , Name End → and Name

  16. ABCD Grammar Name → Alice Terminal symbols Name → Bob Name → Coen Nonterminal symbols Name → Daenerys Sentence → List End Starting symbol List → Name List → List , Name Production rules , Name End → and Name

  17. Using ABCD • Alice and Bob • Alice and Bob • Alice and Bob → Name and • Sentence → List End → List , Name End →
 Bob → Name and Name → Name , Name End → Name Name , Name End →
 and Name → Alice and Name List , Name End → List End → Alice and Bob → Sentence • (analytic semantics) • (generative semantics) Production

  18. Notations • Name → Alice | Bob | Coen | Daenerys • Name → "Alice" | "Bob" | "Coen" | "Daenerys" • Name → “Alice” | “Bob” | “Coen” | “Daenerys” • ⟨ Name ⟩ → Alice | Bob | Coen | Daenerys • ⟨ Name ⟩ → “Alice” | “Bob” | “Coen” | “Daenerys”

  19. Notations • List → List , Name • ⟨ List ⟩ ::= ⟨ List ⟩ "," ⟨ Name ⟩ ; • List "," Name -> List • List <- List ',' Name • define List [List] , [Name] end define • syntax List = List "," Name; • List -> List ',' Name : ['$1'|'$3'].

  20. Common metaconstructs • Choice (disjunction) • Optional symbols • A | B, A / B • A?, [A] • Less common (careful!) • Zero or more (Kleene star) • conjunction • A*, {A} • negation • One or more • exact repetition • reference naming • priorities • A+

  21. Chomsky-Schützenberger hierarchy • Type-0: Recursively enumerable • Rules: α → β (unrestricted) • Type-1: Context-sensitive • Rules: α A β → αγβ • Type-2: Context-free • Rules: A → γ • Type-3: Regular • Rules: A → a and A → aB Noam Chomsky. On Certain Formal Properties of Grammars, Information & Control 2(2):137–167, 1959.

  22. CFG for ABCD ⟨ Name ⟩ → “Alice” | “Bob” | “Coen” | “Daenerys” ⟨ Sentence ⟩ → ⟨ Name ⟩ | ⟨ List ⟩ “and” ⟨ Name ⟩ ⟨ List ⟩ → ⟨ Name ⟩ “,” ⟨ List ⟩ | ⟨ Name ⟩ ⟨ List ⟩ → ⟨ Name ⟩ (“,” ⟨ Name ⟩ )* ⟨ List ⟩ → { ⟨ Name ⟩ “,”}+

  23. Regexp for ABCD ^\w+((, \w+)* and \w+)?$ S. C. Kleene, Representation of Events in Nerve Nets and Finite Automata. In Automata Studies, pp. 3–42, 1956. photo from: Konrad Jacobs, S. C. Kleene, 1978, MFO.

  24. Rose by Arwen Grune; p.58 of Grune/Jacobs’ “Parsing Techniques”, 2008

  25. Finite world • Explicitly given lists • Acyclic automata • Finite choice grammars (non-recursive, non-iterating) • i.e., users, keywords, postcodes

  26. Regular world • Regular expressions • Finite automata • Grammars: • A → a • A → aB • i.e., substring search, substring replace, counting

Recommend


More recommend