introduction to lex or flex
play

Introduction to lex (or flex) Some slides borrowed from M Scherger - PowerPoint PPT Presentation

Introduction to lex (or flex) Some slides borrowed from M Scherger Lex/Flex: A Scanner Generator in C Regular Expression Thomsons Construction Nondeterministic Finite Automaton Subset Construction Deterministic Finite


  1. Introduction to lex (or flex) Some slides borrowed from M Scherger

  2. Lex/Flex: A Scanner Generator in C  Regular Expression Thomson’s Construction  Nondeterministic Finite Automaton “Subset” Construction  Deterministic Finite Automaton  Table-driven Scanner  So why not do this with a tool? 2 Introduction to lex (or flex) Fall 2012

  3. Lex  Lex is a such tool for creating lexical analyzers  M. E. Lesk and E. Schmidt 1975  Lexical analyzers tokenize input streams  Regular expressions define tokens  Tokens are the terminals of a language  Converts regular expressions into DFAs  DFAs are implemented as table driven state machines  Some versions of Lex are proprietary and so not all versions of *nix come with an open source version  flex – Fast Lexical Analyzer is an open source version  Vern Paxson 3 Introduction to lex (or flex) Fall 2012

  4. The Basic Process Lex source program Lex lex.yy.c any.l compiler C a.out lex.yy.c Compiler Sequence a.out Input stream of tokens 4 Introduction to lex (or flex) Fall 2012

  5. Format of a lex File Definitions %% Rules %% User code  1 st section holds declarations of simple name definitions and start conditions  2 nd section holds pattern-action pairs  3 rd section is copied directly to lex.yy.c  C code and comments  Typical file extensions: .l .lex .flex 5 Introduction to lex (or flex) Fall 2012

  6. Compiling and Running > flex linenos.flex yywrap() issue > gcc lexyy.c -lfl > a.out < infile > outfile 6 Introduction to lex (or flex) Fall 2012

  7. Regular Expressions and Lex  A regular expression is an expression that matches sets of strings  (the “language” of the regular expression).  In its basic form, a regular expression is built up out of basic expressions (individual symbols) and the operations  choice (|),  concatenation (no operator),  and repetition (*).  A regular expression may also contain certain other metasymbols:  parentheses for grouping (to change precedence, just as in arithmetic)  others as needed to extend the operator set in useful ways 7 Introduction to lex (or flex) Fall 2012

  8. Regular Expressions in Lex RE Matches  c - c is a single character A A  Matches the character c x x d d  \ c – c is a single character \. .  Use this to escape special characters \n Newline \t tab  “ str ” - str is a string “ Abc ” Abc  Matches entire string str “The” The  [ str ]- str is a string [aeiou] Lowercase vowels  Matches any single character from str [abcde] The letters a to e 8 Introduction to lex (or flex) Fall 2012

  9. Regular Expressions – Character Classes  [ x-y ] – x and y are characters RE Matches [a-z] All lowercase characters  All characters in the range x - y [0-9] All digits [a-df-z] lowercase characters except e  These can be combined [a-z0-9A-Z] Alphanumeric characters [A-Zaeiou] Upper case letters and lc vowels  [^ str ] – str is a string [^ \n\t] all non whitespace [^aeiou] matches anything but lowercase vowels 9 Introduction to lex (or flex) Fall 2012

  10. Regular Expressions  p * – p is a pattern  Zero or more occurrences of p  A AA AAA .... A*  r rr ... r* ab*c* a ab ac abb abc acc abbb abbc abcc accc ...  p + – p is a pattern  One or more occurrences of p A+ A AA AAA AAAA ... ab+ ab abb abbb .... a*b+ b ab bb aab abb bbb .. 10 Introduction to lex (or flex) Fall 2012

  11. Regular Expressions  p ? - p is a pattern  Zero or one occurrences of p  A A? ab?c? a ab ac abc  p { m,n } – p is a pattern, m and n are ints  Matches m through n occurrences of p  if ,n is missing, n = m , if just n is missing n = ∞ a{1,3} a aa aaa a{1,1} a a{1} a a{3,} aaa aaaa aaaaa … 11 Introduction to lex (or flex) Fall 2012

  12. Regular Expressions  p 1 p 2 – p 1 and p 2 are patterns ab ab  Matches p 1 followed by p 2 a+b+ ab aab abb  ( p ) - p is a pattern  Used to override precedence (group things) (abc)+ abc abcabc abcabcabc … abc+ abc abcc abccc …  p 1 |p 2 – p 1 and p 2 are patterns  Matches either p 1 or p 2 a|an|the a an the  Notice precedence ba|ed ba ed b(a|e)d bed bad 12 Introduction to lex (or flex) Fall 2012

  13. Regular Expression - Extra Things  p 1 / p 2 – p 1 and p 2 are patterns  Matches p 1 only if it's followed by p 2  p 2 is not part of yytext RE: a+/bc Input: aaabc bc aaaad matches first aaa only..  ^ p – p is a pattern  matches p only if it is at the start of a line  p $ – p is a pattern  matches p only if it is at the end of a line 13 Introduction to lex (or flex) Fall 2012

  14. Two more complex examples  [-+]?[0-9]+(\.[0-9]+)?([Ee][-+]?[0-9]+)? or:  nat = [0-9]+  signedNat = [-+]? nat  number = signedNat(\. nat)? ([Ee] signedNat)?  C comments /\*/*(\**[^/*]/*)*\**\*/ 14 Introduction to lex (or flex) Fall 2012

  15. Pattern Matching Examples 15 Introduction to lex (or flex) Fall 2012

  16. Format of a lex File Definitions %% Rules %% User code  1 st section holds declarations of simple name definitions and start conditions  2 nd section holds pattern-action pairs  3 rd section is copied directly to lex.yy.c  C code and comments 16 Introduction to lex (or flex) Fall 2012

  17. Definitions  Definitions are of the form: name definition  A name begins with a letter or underscore followed by 0 or more letters, digits, '-', or '_'.  You access it with { name }  Example definitions: Digit [0-9] Char [A-Z] AlphaNum [a-zA-Z0-9] ws [ \n\t] IntegerConst [0-9]+ 17 Introduction to lex (or flex) Fall 2012

  18. Definitions Example Digit [0-9] Char [a-zA-Z] AlphaNum [a-zA-Z0-9] %% {Digit}+”.”{Digit}+ ({Char}|_)({AlphaNum}|[_-])* {printf (“A name '%s' \ n”, yytext);} %% 18 Introduction to lex (or flex) Fall 2012

  19. Rules  Rules are of the form: pattern action  pattern is the RE to match and action is what to do when it is matched  Default rule is to echo the input  Lex matches the longest string possible  If a tie, it matches the 1 st rule in the spec  Actions can be empty – do nothing  Actions can be complex  Use {} if multi-lined  don't forget ';'s  yytext contains the string matched 19 Introduction to lex (or flex) Fall 2012

  20. Example Rules \n linecount++; [0-9]+ sum+=atoi(yytext); {ws}+ a|an|the printf (“found an article \ n”); [aeiou]+ { printf (“A string of vowels \ n”); vcnt++; } 20 Introduction to lex (or flex) Fall 2012

  21. Predefined Rules  ECHO  Copy yytext to output [a-z]+ ECHO;  REJECT  Go to the next alternative, that is the second choice rule to be selected and it’s action taken she s++; he h++;  Won’t count the imbedded he she {s++; REJECT;} he {h++; REJECT;} \n  But this will 21 Introduction to lex (or flex) Fall 2012

  22. Rules Example ex1.l The commands  lex ex1.l %%  produces lex.yy.c a*b printf (“Token 1 found \ n”);  cc -o ex1 lex.yy.c – ll c+ printf (“Token 2 found \ n”);  create executable  May need – lfl if using flex %%  ./ex1 main() {  to execute aaaaaaabbccd yylex(); Default is stdin and Token 1 found } stdout so type Token 1 found aaaaaaaabbccd <return> Token 2 found d 22 Introduction to lex (or flex) Fall 2012

  23. An Example Count chars, words, lines %{ The %{ %} pair allow you unsigned ccnt=0, wcnt = 0, lcnt = 0; to make declarations for %} your lexer word [^ \t\n]+ eol \n %% {word}{wcnt++;ccnt+=yyleng;} {eol} {ccnt++;lcnt++;} . ccnt++; %% main() {yylex(); } 23 Introduction to lex (or flex) Fall 2012

  24. About lex  Lex uses some predefined functions stored in lex library (link with -ll or -lfl)  By default lex copies input to output  By default lex reads stdin, writes stdout  Lex reads its input (a lex script) and produced lex.yy.c  Use %{ and %} in definitions section to declare globals and put #includes  You can use flex instead  Not all 'lex'es are equal!  Man page has more info! 24 Introduction to lex (or flex) Fall 2012

  25. Example 1: The Simplest Example  The simplest example of a lex program is a scanner that acts like the UNIX `cat`program %% . |\n ECHO; %%  Or it could be written as… %% . ECHO; \n ECHO; %% 25 Introduction to lex (or flex) Fall 2012

  26. Lex Predefined Variables 26 Introduction to lex (or flex) Fall 2012

  27. Flex Internal Names Lex internal name Meaning/Use lex.yy.c or lexyy.c Lex output file name yylex Lex scanning routine yytext string matched on current action yyleng length of yytext yyin Lex input file (default: stdin ) yyout Lex output file (default: stdout ) input Lex buffered input routine ECHO Lex default action (print yytext to yyout ) See the Flex documentation for others 27 Introduction to lex (or flex) Fall 2012

  28. Flex Operational Conventions  yylex() runs until it is stopped by a return  ambiguity is resolved by order  any text not explicitly matched is echoed to stdout  EOF is automatically matched and returns 0 from yylex() (unless yywrap() is suitably defined)  yylex() returns an int which can be a token 28 Introduction to lex (or flex) Fall 2012

Recommend


More recommend