hw8 use lex yacc to turn this into this
play

HW8Use Lex/Yacc to Turn this: Into this: <P> - PowerPoint PPT Presentation

HW8Use Lex/Yacc to Turn this: Into this: <P> Here's a list: Here's a list: * This is item one of a list Lex and Yacc <UL> * This is item two. Lists should be <LI> This is item one of a list indented four


  1. HW8–Use Lex/Yacc to Turn this: Into this: <P> Here's a list: Here's a list: * This is item one of a list Lex and Yacc <UL> * This is item two. Lists should be <LI> This is item one of a list indented four spaces, with each item <LI>This is item two. Lists should be marked by a "*" two spaces left of indented four spaces, with each item four-space margin. Lists may contain marked by a "*" two spaces left of four- nested lists, like this: space margin. Lists may contain * Hi, I'm item one of an inner list. nested lists, like this:<UL><LI> Hi, I'm A Quick Tour * Me two. item one of an inner list. <LI>Me two. * Item 3, inner. <LI> Item 3, inner. </UL><LI> Item 3, * Item 3, outer list. outer list.</UL> This is outside both lists; should be back This is outside both lists; should be to no indent. back to no indent. <P><P> Final suggestions: Final suggestions 2 if myVar == 6.02e23**2 then f( .. � Lex / Yacc History char stream LEX token stream ! Origin – early 1970’s at Bell Labs if myVar == 6.02e23**2 then f( � ! Many versions & many similar tools tokenstream YACC ! Lex, flex, jflex, posix, … parse tree ! Yacc, bison, byacc, CUP, posix, … if-stmt ! Targets C, C++, C#, Python, Ruby, ML, … == fun call ! We’ll use jflex & byacc/j, targeting java (but for simplicity, I usually just say lex/yacc) var ** Arg 1 Arg 2 float-lit int-lit . . . � 3 4

  2. Lex: Uses A Lexical Analyzer Generator ! Input: ! “Front end” of many real compilers ! Regular exprs defining "tokens" ! E.g., gcc my.flex ! Fragments of declarations & code ! “Little languages”: ! Output: jflex ! Many special purpose utilities evolve some ! A java program “yylex.java” clumsy, ad hoc , syntax ! Use: yylex.java ! Often easier, simpler, cleaner and more ! Compile & link with your main() flexible to use lex/yacc or similar tools from ! Calls to yylex() read chars & return the start successive tokens. 5 7 yacc: A Parser Generator Lex Input: "mylexer.flex" ! Input: // java stuff my.y %% %: Lex ! A context-free grammar section %byaccj ! Fragments of declarations & code Declarations & code: most delims %{ copied verbatim to java pgm byaccj ! Output: public foo()… ! A java program & some “header” files %} ! Use: ParserVal.java Token code %% ! Compile & link it with your main() Rules/ [a-zA-Z]+ {foo(); return(42); } regexps ! Call yyparse() to parse the entire input Parser.java [ \t\n] {; /* skip whitespace */} + … ! yyparse() calls yylex() to get successive tokens {Actions} No action 9 11

  3. S ! E E ! E+n | E-n | n Lex Regular Expressions ! Yacc Input: “expr.y” %{ Letters & numbers match themselves ! Parser.java Java decls import java.io.*;… %} Ditto \n, \t, \r ! Yacc decls Parser.java %token NUM VAR Punctuation often has special meaning ! %% But can be escaped: \* matches “*” ! stmt: exp { printf(”%d\n”,$1);} Union, Concatenation and Star ! ; Rules exp : exp ’+’ NUM { $$ = $1 + $3; } r|s, rs, r*; also r+, r?; parens for grouping ! and | exp ’-’ NUM { $$ = $1 - $3; } Character groups ! {Actions} | NUM { $$ = $1; } [ab*c] == [*cab], [a-z2648AEIOU], [^abc] ! ; C code; java ex later “^” for “not” only in char groups, not complementation ! %% Parser.java Java code public static void main(… 12 14 Lex/Yacc Interface: Expression lexer: “expr.l” Compile Time y.tab.h: %{ my.y my.flex more.java #define NUM 258 #include "y.tab.h" #define VAR 259 byaccj #define YYSTYPE int jflex %} extern YYSTYPE yylval; %% [0-9]+ { yylval = atoi(yytext); return NUM;} Yylex.java Parser.java ParserVal.java [ \t] { /* ignore whitespace */ } \n { return 0; /* logical EOF */ } javac . { return yytext[0]; /* +-*, etc. */ } %% yyerror(char *msg){printf("%s,%s\n",msg,yytext);} Parser.class int yywrap(){return 1;} 15 17

  4. Lex/Yacc Interface: Parser “Value” class Run Time public class ParserVal 
 //then do � main() { 
 yylval = new ParserVal(3.14); 
 public int ival; 
 yylval = new ParserVal(42); 
 public double dval; 
 // ...or something like... 
 public String sval; 
 yyparse() yylval = new ParserVal(new 
 public Object obj; 
 myTypeOfObject()); public ParserVal(int val) 
 { ival=val; } 
 Token code public ParserVal(double val) 
 yylex () yylval { dval=val; } 
 // in yacc actions, e.g.: � public ParserVal(String val) 
 { sval=val; } 
 $$.ival = $1.ival + $2.ival; 
 Myaction: public ParserVal(Object val) 
 $$.dval = $1.dval - $2.dval; � ... { obj=val; } 
 Token value }//end class � yylval = ... ... return(code) 18 20 “Calculator” example On this & More Yacc Declarations From http://byaccj.sourceforge.net/ next 3 slides, some details may be missing or %{ � wrong, but import java.lang.Math; � the big import java.io.*; � picture is OK import java.util.StringTokenizer; � Token %token BHTML BHEAD BTITLE BBODY P BR LI %} � names & %token EHTML EHEAD ETITLE EBODY /* YACC Declarations; mainly op prec & assoc */ � types %token <sval> TEXT %token NUM � %left '-' '+’ � Type of yylval (if any) Nonterm %left '*' '/’ � %type <obj> page head title %left NEG /* negation--unary minus */ � names & %right '^' /* exponentiation */ � %type <obj> words list item items types /* Grammar follows */ � %% � %start page Start sym ... � 22 25

  5. %% � ... � String ins; � /* Grammar follows */ � StringTokenizer st; � %% � void yyerror(String s){ � input: /* empty string */ � input is one expression per line; System.out.println("par:"+s); � | input line � output is its value } � ; � boolean newline; � NOT using lex; barehanded int yylex(){ � lexer with same interface line: ’\n’ � String s; int tok; Double d; � | exp ’\n’ { System.out.println(" ” + $1.dval + " "); } � if (!st.hasMoreTokens()) � ; � if (!newline) { � token code newline=true; � exp: NUM � { $$ = $1; } � via return return ’\n'; //As in classic YACC example � | exp '+' exp � { $$ = new ParserVal($1.dval + $3.dval); } � } else return 0; � | exp '-' exp � { $$ = new ParserVal($1.dval - $3.dval); } � s = st.nextToken(); � value via yylval | exp '*' exp � { $$ = new ParserVal($1.dval * $3.dval); } � try { � | exp '/' exp � { $$ = new ParserVal($1.dval / $3.dval); } � d = Double.valueOf(s); /*this may fail*/ � | '-' exp %prec NEG � { $$ = new ParserVal(-$2.dval); } � yylval = new ParserVal(d.doubleValue()); � | exp '^' exp � { $$=new ParserVal(Math.pow( $1.dval, $3.dval ));} � tok = NUM; } � | '(' exp ')' � { $$ = $2; } � See slide 20 catch (Exception e) { � ; � tok = s.charAt(0);/*if not float, return char*/ � } 
 %% � Ambiguous grammar; prec/assoc decls are a (smart) hack to fix that. return tok; � ... � } � 26 27 void dotest(){ � BufferedReader in = new BufferedReader(new InputStreamReader(System.in)); � System.out.println("BYACC/J Calculator Demo"); � System.out.println("Note: Since this example uses the StringTokenizer"); � System.out.println("for simplicity, you will need to separate the items"); � System.out.println("with spaces, i.e.: '( 3 + 5 ) * 2'"); � while (true) { 
 System.out.print("expression:"); � Lex and Yacc try { � ins = in.readLine(); � } � catch (Exception e) { } � st = new StringTokenizer(ins); � newline=false; � More Details yyparse(); � } � } � public static void main(String args[]){ � Parser par = new Parser(false); � par.dotest(); � } � 28

Recommend


More recommend