Language Processing Credits: Sommerville, Chapter 13.4 Andy Pimentel, University of Amsterdam David Albrecht, Monash University Charles A. Ofria, Michigan State University Wuwei Shen, Western Michigan University Instructor: Peter Baumann email: p.baumann@jacobs-university.de tel: -3178 office: room 88, Research 1 320312 Software Engineering (P. Baumann)
To warm up… " Parser development is still a black art ." -- Paul Klint et. al, Towards an engineering discipline for GRAMMARWARE, in: ACM TOSEM, May 2005 Some magic: Sort word list X in APL: All primes up to R: 320312 Software Engineering (P. Baumann) 2
Roadmap Compilers & Co Flex & Bison: The Mechanics Grammars and actions Error handling and debugging Wrap-up 320312 Software Engineering (P. Baumann) 3
Language Processing Systems Accept a natural or artificial language as input and generate some other representation of that language • Compiler: generate machine code; ex: gcc • Interpreter: act immediately on instructions while being processed; ex: SQL, JS Used: when easiest way to solve a problem is to describe algorithm or data • Meta-CASE tools process tool descriptions, method rules, etc, to generate tools 320312 Software Engineering (P. Baumann) 4
Roadmap Compilers & Co Flex & Bison: The Mechanics • YACC / bison • (F)lex • Their interplay, and how to code it Grammars and actions Error handling and debugging Wrap-up 320312 Software Engineering (P. Baumann) 5
What is Bison? YACC (" Yet Another Compiler Compiler" ) = a parser generator • Bison = GNU yacc • LALR(1) Parser generator = tool producing a parser for a given grammar • ie, produce source code of syntactic analyzer for corresponding language • stack to remember all nodes generated in parse tree up to now (so stack empty at the end) Input: myparser.y[pp] containing grammar (=rules) + actions Output: C[++] program myparser.c[pp] • + optionally header file of tokens myparser.h 320312 Software Engineering (P. Baumann) 6
What is (F)lex? Lex = a scanner generator • Flex = "fast lex" • Regular expressions Input: scanner.l containing patterns (=rules) + actions Output: C program scanner.c Typically, the generated scanner produces tokens for the (YACC-generated) parser 320312 Software Engineering (P. Baumann) 7
Synopsis: How F&B Do The Job Bison grammar defines admissible sequences ("sentences") • Context-free grammar (and more) expr : NUM '+' NUM ; Flex grammar defines single tokens ("words") • Regular expressions [+-]?[0-9]+ return NUM; "+" return PLUS; [ \t\n]+ /* do nothing */ Ex: 12 + 2 320312 Software Engineering (P. Baumann) 8
Flex & Bison: a Team main() "OK!" Bison "saw token NUM!" yyparse() NUM „+‟ NUM [0-9]+ Flex nextToken = yylex() yylex() 12 + 2 12 + 2 320312 Software Engineering (P. Baumann) 9
Bison File Format • Code to be copied before generated code Definitions • Definitions used: tokens, types, etc. %% • Pairs of production rules and actions Rules • productions rules describe CFG %% • user subroutines • Copied after the end of the bison Supplementary Code generated code The identical LEX format was actually taken from this... 320312 Software Engineering (P. Baumann) 10
Definitions Bison Definitions Section %% Rules %% Supplementary Code %{ Typedefs, includes, namespaces, … Copied literally into C source #include <stdio.h> #include <stdlib.h> %} terminal %token ID NUM symbols %start expr start symbol (a non-terminal, obviously) 320312 Software Engineering (P. Baumann) 11
Definitions Bison Rules Section %% Rules %% Supplementary Code Contains grammar • referring to previously defined non-terminals and terminals Example: Be nice, define PLUS expr : expr '+' term | term ; term : term '*' factor | factor char, ; not string! factor : '(' expr ')' | ID | NUM 320312 Software Engineering (P. Baumann) 12
Definitions Bison Code Section %% Rules %% Supplementary Code "any other code" • main(), yyerror() , ... Called by bison code when encountering void yyerror(char* err) special token error { cerr << "Syntax error: " << err << endl; } main() { This is the parser return yyparse(); } 320312 Software Engineering (P. Baumann) 13
FLEX • Code to be copied before generated code Definitions • Definitions used: tokens, etc. %% • Pairs of production rules and actions Rules • rules describe regular expressions %% • user subroutines • Copied after the end of the flex generated Supplementary Code code cf YACC / bison! 320312 Software Engineering (P. Baumann) 14
FLEX Code Example %{ #include <stdio.h> #include "parser.h" %} id [_a-zA-Z][_a-zA-Z0-9]* num [+-]?[0-9]+ semi [;] wspc [ \t\n]+ %% Defined in bison's {id} { return ID; } parser.h {num} { return NUM; } {semi} { return SEMI; } {wspc} {;} Returns only token tag – actual value passed elsewhere 320312 Software Engineering (P. Baumann) 15
Sidebar: If I Don't Want to Use FLEX #include "parser.h" int yylex() { if (it's a num) return NUM; else if (it's an id) return ID; else if (end of input) return 0; else if (it's an error) return -1; } 320312 Software Engineering (P. Baumann) 16
Flex/Bison Code: How to Compile & Link $ flex scanner.l $ bison – d myparser.ypp $ gcc – o parser myparser.cpp lex.yy.c – ly – lfl scanner.l myparser.ypp flex bison lex.yy.c myparser.cpp gcc parser 320312 Software Engineering (P. Baumann) 17
Roadmap Compilers & Co Flex & Bison: The Mechanics Grammars and actions Error handling and debugging Wrap-up 320312 Software Engineering (P. Baumann) 18
Semantic Actions in Bison: Overview Action = code executed when rule is applied • Any C/C++ code • Ex: expression evaluation, symbol table insertion / lookup, parse tree build-up • How to pass information between rules? Attribute values store intermediate results • $1 , $2 , … = result from evaluating (non - ) terminal #1, #2, … • $$ = result of current expression expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } 320312 Software Engineering (P. Baumann) 19
Dynamics of Rule Processing Rule "fires" = right-hand side reduced to left-hand non-terminal • Rules reduced bottom-up • Successful if, at EOF, only axiom remains (empty stack) term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor: NUM { $$ = yylval; } $$ 6 term Bison $$ 2 $$ 3 term '*' factor $$ 2 factor 2 * 3 yylval 2 yylval 3 NUM NUM Flex 320312 Software Engineering (P. Baumann) 20
Semantic Actions: Larger Example expr: expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term: term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor: '(' expr ')' { $$ = $2; } | ID { $$ = lookup(symbolTable,yylval); } | NUM { $$ = yylval; } ; Does not run like this; yylval trickier in real life, usually needs a union! 320312 Software Engineering (P. Baumann) 21
Roadmap Compilers & Co Flex & Bison: The Mechanics Grammars and actions Error handling and debugging Wrap-up 320312 Software Engineering (P. Baumann) 22
Error Handling: Catch & Recover & Report Good error handling includes • Elastic recovery from errors, graceful continuation • Meaningful diagnostic messages 320312 Software Engineering (P. Baumann) 23
Error Handling: Catch & Recover Elastic recovery from errors, graceful continuation • Implementing good error handling can be extremely tricky! Example, good for line-oriented input (lab assembler!): • bison std token error eats up all non-understood tokens • Predefined macros reset parser + scanner for meaningful continuation • + individual actions (message output, …) line : /* empty */ | line whatever | line error /* std error token */ { yyerror( "Failure :-(" ); /* msg output etc. */ yyerrok; /* reset parser */ yyclearin; /* reset scanner */ } 320312 Software Engineering (P. Baumann) 24
Error Handling: Syntactic vs Semantic Errors Syntactic error: caught by parser, need to manually cure & reset: line : /* empty */ | line whatever | line error { yyerror( "Failure :-(" ); yyerrok; /* reset parser */ yyclearin; /* reset scanner */ } Semantic error: caught by your action, ignored by parser expr: expr '/' expr { if ($3 == 0.0) yyerror(“div by zero”); else $$ = $1 / $3; } 320312 Software Engineering (P. Baumann) 25
Error Handling: Report Provide meaningful diagnostic output! • Compiler needs to give programmer a good advice Useful information: Line number, column number, violating token, what understood vs what expected, … Examples: • Bad: " Syntax error " • Good: " Line 15, column 21, near token 'flip': found unknown instruction parameter 'coin' " Note: find current line number in global variable yylineno 320312 Software Engineering (P. Baumann) 26
Recommend
More recommend