language processing
play

Language Processing Credits: Sommerville, Chapter 13.4 Andy - PowerPoint PPT Presentation

Language Processing Credits: Sommerville, Chapter 13.4 Andy Pimentel, University of Amsterdam David Albrecht, Monash University Charles A. Ofria, Michigan State University Wuwei Shen, Western Michigan University Instructor: Peter Baumann


  1. Language Processing Credits: Sommerville, Chapter 13.4 Andy Pimentel, University of Amsterdam David Albrecht, Monash University Charles A. Ofria, Michigan State University Wuwei Shen, Western Michigan University Instructor: Peter Baumann email: p.baumann@jacobs-university.de tel: -3178 office: room 88, Research 1 320312 Software Engineering (P. Baumann)

  2. To warm up…  " Parser development is still a black art ." -- Paul Klint et. al, Towards an engineering discipline for GRAMMARWARE, in: ACM TOSEM, May 2005  Some magic: Sort word list X in APL: All primes up to R: 320312 Software Engineering (P. Baumann) 2

  3. Roadmap  Compilers & Co  Flex & Bison: The Mechanics  Grammars and actions  Error handling and debugging  Wrap-up 320312 Software Engineering (P. Baumann) 3

  4. Language Processing Systems Accept a natural or artificial language as input  and generate some other representation of that language • Compiler: generate machine code; ex: gcc • Interpreter: act immediately on instructions while being processed; ex: SQL, JS Used: when easiest way to solve a problem is  to describe algorithm or data • Meta-CASE tools process tool descriptions, method rules, etc, to generate tools 320312 Software Engineering (P. Baumann) 4

  5. Roadmap  Compilers & Co  Flex & Bison: The Mechanics • YACC / bison • (F)lex • Their interplay, and how to code it  Grammars and actions  Error handling and debugging  Wrap-up 320312 Software Engineering (P. Baumann) 5

  6. What is Bison?  YACC (" Yet Another Compiler Compiler" ) = a parser generator • Bison = GNU yacc • LALR(1)  Parser generator = tool producing a parser for a given grammar • ie, produce source code of syntactic analyzer for corresponding language • stack to remember all nodes generated in parse tree up to now (so stack empty at the end)  Input: myparser.y[pp] containing grammar (=rules) + actions  Output: C[++] program myparser.c[pp] • + optionally header file of tokens myparser.h 320312 Software Engineering (P. Baumann) 6

  7. What is (F)lex?  Lex = a scanner generator • Flex = "fast lex" • Regular expressions  Input: scanner.l containing patterns (=rules) + actions  Output: C program scanner.c  Typically, the generated scanner produces tokens for the (YACC-generated) parser 320312 Software Engineering (P. Baumann) 7

  8. Synopsis: How F&B Do The Job  Bison grammar defines admissible sequences ("sentences") • Context-free grammar (and more) expr : NUM '+' NUM ;  Flex grammar defines single tokens ("words") • Regular expressions [+-]?[0-9]+ return NUM; "+" return PLUS; [ \t\n]+ /* do nothing */  Ex: 12 + 2 320312 Software Engineering (P. Baumann) 8

  9. Flex & Bison: a Team main() "OK!" Bison "saw token NUM!" yyparse() NUM „+‟ NUM [0-9]+ Flex nextToken = yylex() yylex() 12 + 2 12 + 2 320312 Software Engineering (P. Baumann) 9

  10. Bison File Format • Code to be copied before generated code Definitions • Definitions used: tokens, types, etc. %% • Pairs of production rules and actions Rules • productions rules describe CFG %% • user subroutines • Copied after the end of the bison Supplementary Code generated code The identical LEX format was actually taken from this... 320312 Software Engineering (P. Baumann) 10

  11. Definitions Bison Definitions Section %% Rules %% Supplementary Code %{ Typedefs, includes, namespaces, … Copied literally into C source #include <stdio.h> #include <stdlib.h> %} terminal %token ID NUM symbols %start expr start symbol (a non-terminal, obviously) 320312 Software Engineering (P. Baumann) 11

  12. Definitions Bison Rules Section %% Rules %% Supplementary Code  Contains grammar • referring to previously defined non-terminals and terminals  Example: Be nice, define PLUS expr : expr '+' term | term ; term : term '*' factor | factor char, ; not string! factor : '(' expr ')' | ID | NUM 320312 Software Engineering (P. Baumann) 12

  13. Definitions Bison Code Section %% Rules %% Supplementary Code  "any other code" • main(), yyerror() , ... Called by bison code when encountering void yyerror(char* err) special token error { cerr << "Syntax error: " << err << endl; } main() { This is the parser return yyparse(); } 320312 Software Engineering (P. Baumann) 13

  14. FLEX • Code to be copied before generated code Definitions • Definitions used: tokens, etc. %% • Pairs of production rules and actions Rules • rules describe regular expressions %% • user subroutines • Copied after the end of the flex generated Supplementary Code code cf YACC / bison! 320312 Software Engineering (P. Baumann) 14

  15. FLEX Code Example %{ #include <stdio.h> #include "parser.h" %} id [_a-zA-Z][_a-zA-Z0-9]* num [+-]?[0-9]+ semi [;] wspc [ \t\n]+ %% Defined in bison's {id} { return ID; } parser.h {num} { return NUM; } {semi} { return SEMI; } {wspc} {;} Returns only token tag – actual value passed elsewhere 320312 Software Engineering (P. Baumann) 15

  16. Sidebar: If I Don't Want to Use FLEX #include "parser.h" int yylex() { if (it's a num) return NUM; else if (it's an id) return ID; else if (end of input) return 0; else if (it's an error) return -1; } 320312 Software Engineering (P. Baumann) 16

  17. Flex/Bison Code: How to Compile & Link $ flex scanner.l $ bison – d myparser.ypp $ gcc – o parser myparser.cpp lex.yy.c – ly – lfl scanner.l myparser.ypp flex bison lex.yy.c myparser.cpp gcc parser 320312 Software Engineering (P. Baumann) 17

  18. Roadmap  Compilers & Co  Flex & Bison: The Mechanics  Grammars and actions  Error handling and debugging  Wrap-up 320312 Software Engineering (P. Baumann) 18

  19. Semantic Actions in Bison: Overview  Action = code executed when rule is applied • Any C/C++ code • Ex: expression evaluation, symbol table insertion / lookup, parse tree build-up • How to pass information between rules?  Attribute values store intermediate results • $1 , $2 , … = result from evaluating (non - ) terminal #1, #2, … • $$ = result of current expression expr : expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } 320312 Software Engineering (P. Baumann) 19

  20. Dynamics of Rule Processing  Rule "fires" = right-hand side reduced to left-hand non-terminal • Rules reduced bottom-up • Successful if, at EOF, only axiom remains (empty stack) term : term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } factor: NUM { $$ = yylval; } $$  6 term Bison $$  2 $$  3 term '*' factor $$  2 factor 2 * 3 yylval  2 yylval  3 NUM NUM Flex 320312 Software Engineering (P. Baumann) 20

  21. Semantic Actions: Larger Example expr: expr '+' term { $$ = $1 + $3; } | term { $$ = $1; } ; term: term '*' factor { $$ = $1 * $3; } | factor { $$ = $1; } ; factor: '(' expr ')' { $$ = $2; } | ID { $$ = lookup(symbolTable,yylval); } | NUM { $$ = yylval; } ; Does not run like this; yylval trickier in real life, usually needs a union! 320312 Software Engineering (P. Baumann) 21

  22. Roadmap  Compilers & Co  Flex & Bison: The Mechanics  Grammars and actions  Error handling and debugging  Wrap-up 320312 Software Engineering (P. Baumann) 22

  23. Error Handling: Catch & Recover & Report  Good error handling includes • Elastic recovery from errors, graceful continuation • Meaningful diagnostic messages 320312 Software Engineering (P. Baumann) 23

  24. Error Handling: Catch & Recover  Elastic recovery from errors, graceful continuation • Implementing good error handling can be extremely tricky!  Example, good for line-oriented input (lab assembler!): • bison std token error eats up all non-understood tokens • Predefined macros reset parser + scanner for meaningful continuation • + individual actions (message output, …) line : /* empty */ | line whatever | line error /* std error token */ { yyerror( "Failure :-(" ); /* msg output etc. */ yyerrok; /* reset parser */ yyclearin; /* reset scanner */ } 320312 Software Engineering (P. Baumann) 24

  25. Error Handling: Syntactic vs Semantic Errors  Syntactic error: caught by parser, need to manually cure & reset: line : /* empty */ | line whatever | line error { yyerror( "Failure :-(" ); yyerrok; /* reset parser */ yyclearin; /* reset scanner */ }  Semantic error: caught by your action, ignored by parser expr: expr '/' expr { if ($3 == 0.0) yyerror(“div by zero”); else $$ = $1 / $3; } 320312 Software Engineering (P. Baumann) 25

  26. Error Handling: Report  Provide meaningful diagnostic output! • Compiler needs to give programmer a good advice  Useful information: Line number, column number, violating token, what understood vs what expected, …  Examples: • Bad: " Syntax error " • Good: " Line 15, column 21, near token 'flip': found unknown instruction parameter 'coin' "  Note: find current line number in global variable yylineno 320312 Software Engineering (P. Baumann) 26

Recommend


More recommend