more than parsing
play

More Than Parsing http://babel.ls.fi.upm.es/research/mtp/ A. Herranz - PowerPoint PPT Presentation

More Than Parsing http://babel.ls.fi.upm.es/research/mtp/ A. Herranz 1 . Nogueira 2 P 1 Facultad de Informtica Universidad Politcnica de Madrid 2 School of CS University of Nottingham PROLE 2005 Herranz, Nogueira (UPM, U. Nottingham) MTP


  1. More Than Parsing http://babel.ls.fi.upm.es/research/mtp/ A. Herranz 1 . Nogueira 2 P 1 Facultad de Informática Universidad Politécnica de Madrid 2 School of CS University of Nottingham PROLE 2005 Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 1 / 29

  2. Conclusions. . . :) GONF is a formalism for specifying both concrete and structured abstract syntax. Syntactic and semantic restrictions and parameterised non-terminals impose the abstraction process at the design level. GONF specifications are language-independent definitions of data types as reflected in the concrete grammar description. Minimal formalism that suits a variety of generation schema and implementation languages. Formalism tested with different developments (SLAM, MTP). GONF-based tool: MTP . Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 2 / 29

  3. Motivation Group involved in language design and development. Evolving prototypes. Best programming practices needed: front-end (parsing and structured abstract syntax generation) and back-end boundary relies on the abstract syntax. Just interested in the impacts in the back-end but . . . changing the front-end (parsing + AST generation) is tedious and time consumming. Ordinary tools do not help: grammar cluttered up with semantic actions. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 3 / 29

  4. Semantic Actions Most language tools are just parser generators. Abstract syntax tree (AST) scheme defined by hand in the implementation language. Semantic actions to generate an AST node that representent a sentence. Example (YACC like production) fun_decl ::= id "(" { /* Actions in C */ } opt_params ")" "{" decls stmts "}" { /* Actions in C */ }; ◮ Parsing method dependent. ◮ Non-cohesive. ◮ Difficult to maintain. Recent tools come to aid. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 4 / 29

  5. Our Aims Formalism and tool. Just one file : concrete and structured abstract syntax in one go! ◮ Good quality AST scheme generation. ◮ Traversal pattern scheme generation. ◮ Parser generation: syntax analysis + AST construction. Language independent. Impose the AST design directly on the formalism for concrete syntax: ◮ Think in the abstract structure while the concrete syntax is described. ◮ Minimise annotations (no semantic actions). Improve Productivity. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 5 / 29

  6. Backus-Naur Form (BNF) CFGs are type definitions: a → α 1 | . . . | α n a non-terminal and α i are sequences of symbols. Non-terminals represent set of sentences. Non-terminals represented as sums and products. Example (BNF production) stmts → stmt | stmts stmt Example (Type definition) Stmts = Stmt + Stmts × Stmt Sentence represented as trees with tokens in their leafs. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 6 / 29

  7. BNF (contd.) Example Stmts = Stmt + Stmts × Stmt Easy realisation. Algebraic approach (Haskell): data Stmts = Alt1 Stmt | Alt2 Stmts Stmt OO approach (Java): abstract class Stmts {...} class Alt1 extends Stmts {Stmt stmt;...} class Alt2 extends Stmts {Stmts stmts; Stmt stmt;...} Ordinary imperative type language a bit more complicated. Types do not reflects the abstract structure naturally. Force the designer to introduce names. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 7 / 29

  8. Object Normal Form (ONF), Wu&Wang Classification ( is-a ): a → a 1 | . . . | a n Structure ( has-a ): b → x 1 . . . x m ONF reduces the distance between concrete syntax and language’s abstract structure: Example (No ONF) stmt → var_name ":=" expr | fun_name " ( " arg_list " ) " Example (ONF) stmt → assign | fun_call → var_name ":=" expr assign → fun_name " ( " arg_list " ) " fun_call Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 8 / 29

  9. “Extended” ONF (EONF) But names are not enough: unnatural structures emerge. Example (ONF) stmt_list → stmt_list_branch | stmt stmt_list_branch → stmt_list stmt Iteratives and optionals can help (suitable abstract structure): Example (EONF) → stmt + stmt_list Example (Haskell and Java) type StmtList = [Stmt] class StmtList { public NESeq<Stmt> stmtSeq1; ...} Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 9 / 29

  10. Iterative and Optionals Natural abstract structures for iteratives and optionals in different approaches. From EONF descriptions better ASTs are obtained but. . . Example (EONF) record → "RECORD" ( var_id ":" type ";" ) + "END" Nameless composite types are needed: Seq ( VarId × Type ) Nevertheless, nameless composite can get out of hand: ( x ( yz ) ∗ w ) + . Force the designer to introduce names: Example (?ONF) record → "RECORD" field + "END" → var_id ":" type_id ";" field Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 10 / 29

  11. Generalised ONF A more general and proper extension: designer defined containers as generic (parameterised) non-terminals. More concise and reusable grammars and better AST definitions. Example (GONF) list ( x,t ) → x ( t x ) ∗ list ( arg, "," ) arg_list → list ( stmt, ";" ) stmt_list → Parameterised non-terminals define parametersied containers: Example (C++) template < typename X> class List { X x; Seq<X> xSeq; }; typedef List<Arg> ArgList; typedef List<Stmt> StmtList; Macro grammars, Thienmann & Neubauer. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 11 / 29

  12. GONF Formalisation (Syntax) → " ( " x ( t x ) ∗ " ) " parlist ( x,t ) terminal → TERM → production + grammar non_terminal → production → nonterm " → " rhs ";" NONTERM actual ? → NONTERM formals ? nonterm actuals → parlist ( VAR, "," ) parlist ( actual, "," ) formals → → constr + rhs classif | struct actual → → nonterm ( " | " nonterm ) + classif sugared → " ( " constr + " ) " post → lab_constr + struct ( LAB ":" ) ? constr post → opt | seq0 | lab_constr → seq1 constr → terminal → " ? " | non_terminal opt → " ∗ " | sugared seq0 → " + " | var seq1 var → VAR Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 12 / 29

  13. GONF Formalisation (contd.) Iteratives and optionals are thought of as syntactic sugar for built-in parameterised non-terminals. Contextual analysis restricts the use of every actual parameter to a sequence of constructs where at most one element has information. Example (Non valid GONF) record → "RECORD" ( var_id ":" type ";" ) + "END" Example (GONF) record → "RECORD" field + "END" → var_id ":" type_id ";" field Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 13 / 29

  14. Disposable Terminals Symbols with information are those that define AST nodes. Example (GONF) field → ID COLON type SEMICOLON Let us suppose ID is a terminal with a cardinal greater than 1 and COLON and SEMICOLON are terminals with a cardinal equal to 1: Example Field = Terminal × Type Actual parameters restricted to only one informative symbol: Example (Valid GONF Production) stmts ( stmt SEMICOLON ) ∗ → Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 14 / 29

  15. AST Schemes from GONF Classifications: ◮ Subclassing. ◮ Disjoint sums. Structures: ◮ Named composition (field records or attributes). Parametrical non-terminals: ◮ Parametric polymorphic types. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 15 / 29

  16. Classification as Subclassing (Practice) Interpretation of classifications as is-a relationships is, in many cases, spurious. Example (Spurious is-a relation) type_expr → simple_name | qualified_name → simple_name " ( " arg_list " ) " fun_call If a simple_name is-a type_expr then a function name is a type expression (!?). At the conceptual level we are, likely, talking about UML roles that can be simulated: Example (Role simulation) type_expr → simple_type_name | qualified_type_name simple_type_name → simple_name Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 16 / 29

  17. Classification as Disjoint Sums (Practice) Interpretation of classifications as an algebraic type definition is much more natural. Example (ONF) type_expr → simple_name | qualified_name Example (Haskell) data TypeExpr = SimpleNameToTypeExpr SimpleName | QualifiedNameToTypeExpr QualifiedName Automatically generated, constructors are meaningful: SimpleNameToTypeExpr :: SimpleName -> TypeExpr QualifiedNameToTypeExpr :: QualifiedName -> TypeExpr Algebraic types can be simulated in OO by using the DP State. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 17 / 29

  18. More Than Parsing (MTP) MTP is a GONF based tool. MTP generates the AST representation from a GONF specification. MTP generates a parser that builds AST nodes. MTP deals with practical issues (v0.1): ◮ Modularisation. ◮ Lexical analysis. ◮ Grammar analysis and transformation ( LL ( 1 ) ). ◮ Automatic error recovering. ◮ Target language and target practices aware (Java 1.4). ◮ Syntactic sugar (precedence, associativity). Practices checked: bootstrapping in v0.3. Herranz, Nogueira (UPM, U. Nottingham) MTP PROLE’05 18 / 29

Recommend


More recommend