In tro duction to F unctional Programming: Lecture 10 1 In tro duction to F unctional Programming John Harrison Univ ersit y of Cam bridge Lecture 10 ML examples I I: Recursiv e Descen t P arsing T opics co v ered: � The parsing problem � Recursiv e descen t � P arsers in ML � Higher order parser com binators � E�ciency and limitations. John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 2 Grammar for terms W e w ould lik e to ha v e a parser for our terms, so that w e don't ha v e to write them in terms of t yp e constructors. ter m � ! name ( ter ml ist ) j name j ( ter m ) j numer al j - ter m j ter m + ter m j ter m * ter m ter ml ist � ! ter m , ter ml ist j ter m Here w e ha v e a grammar for terms, de�ned b y a set of pro duction rules. John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 3 Am biguit y The task of p arsing , in general, is to rev erse this, i.e. �nd a sequence of pro ductions that could generate a giv en string. Unfortunately the ab o v e grammar is ambiguous , since certain strings can b e pro duced in sev eral w a ys, e.g. ter m � ! ter m + ter m � ! ter m + ter m * ter m and ter m � ! ter m * ter m � ! ter m + ter m * ter m These corresp ond to di�eren t `parse trees'. E�ectiv ely , w e are free to in terpret x + y * z either as x + ( y * z ) or ( x + y ) * z . John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 4 Enco ding precedences W e can enco de op erator precedences b y in tro ducing extra categories, e.g. atom � ! name ( ter ml ist ) j name j numer al j ( ter m ) j - atom mul exp � ! atom * mul exp j atom ter m � ! mul exp + ter m j mul exp ter ml ist � ! ter m , ter ml ist j ter m No w it's unam biguous. Multiplication has higher precedence and b oth in�xes asso ciate to the righ t. John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 5 Recursiv e descen t A r e cursive desc ent parser is a series of m utually recursiv e functions, one for eac h syn tactic category ( ter m , mul exp etc.). The m utually recursiv e structure mirrors that in the grammar. This mak es them quite easy and natural to write | esp ecially in ML, where recursion is the principal con trol mec hanism. F or example, the pro cedure for parsing terms, sa y term will, on encoun tering a - sym b ol, mak e a recursiv e call to itself to parse the subterm, and on encoun tering a name follo w ed b y an op ening paren thesis, will mak e a recursiv e call to termlist . This in itself will mak e at least one recursiv e call to term , and so on. John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 6 P arsers in ML W e assume that a parser accepts a list of input c haracters or tok ens of arbitrary t yp e. It returns the result of parsing, whic h has some other arbitrary t yp e, and also the list of input ob jects not y et pro cessed. Therefore the t yp e of a parser is: ( � ) l ist ! � � ( � ) l ist F or example, when giv en the input c haracters (x + y) * z the function atom will pro cess the c haracters (x + y) and lea v e the remaining c haracters * z . It migh t return a parse tree for the pro cessed expression using our earlier recursiv e t yp e, and hence w e w ould ha v e: atom "(x + y) * z" = Fn("+",[Var "x", Var "y"]),"* z" John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 7 P arser com binators In ML, w e can de�ne a series of c ombinators for plugging parsers together and creating new parsers from existing ones. By giving some of them in�x status, w e can mak e the ML parser program lo ok quite similar in structure to the original grammar. First w e declare an exception to b e used where parsing fails: exception Noparse; p1 ++ p2 applies p1 �rst and then applies p2 to the remaining tok ens; many k eeps applying the same parser as long as p ossible. p >> f w orks lik e p but then applies f to the result of the parse. p1 || p2 tries p1 �rst, and if that fails, tries p2 . These are automatically in�x, in decreasing order of precedence. John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 8 De�nitions of the com binators fun ++ (parser1,parser2) input = let val (result1,rest1) = parser1 input val (result2,rest2) = parser2 rest1 in ((result1,result 2) ,re st 2) end; fun many parser input = let val (result,next) = parser input val (results,rest) = many parser next in ((result::result s) ,re st ) end handle Noparse => ([],input); fun >> (parser,treatment ) input = let val (result,rest) = parser input in (treatment(resul t) ,re st ) end; fun || (parser1,parser2) input = parser1 input handle Noparse => parser2 input; John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 9 Auxiliary functions W e mak e some of these in�x: infixr 8 ++; infixr 7 >>; infixr 6 ||; W e will use the follo wing general functions b elo w: fun itlist f [] b = b | itlist f (h::t) b = f h (itlist f t b); fun K x y = x; fun fst(x,y) = x; fun snd(x,y) = y; val explode = map str o explode; John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 10 A tomic parsers W e need a few primitiv e parsers to get us started. fun some p [] = raise Noparse | some p (h::t) = if p h then (h,t) else raise Noparse; fun a tok = some (fn item => item = tok); fun finished input = if input = [] then (0,input) else raise Noparse; The �rst t w o accept something satisfying p , and something equal to tok , resp ectiv ely . The last one mak es sure there is no unpro cessed input. John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 11 Lexical analysis First w e w an t to do lexical analysis, i.e. split the input c haracters in to tok ens. This can also b e done using our com binators, together with a few c haracter discrimination functions. First w e declare the t yp e of tok ens: datatype token = Name of string | Num of string | Other of string; W e w an t the lexer to accept a string and pro duce a list of tok ens, ignoring spaces, e.g. - lex "sin(x + y) * cos(2 * x + y)"; > val it = [Name "sin", Other "(", Name "x", Other "+", Name "y", Other ")", Other "*", Name "cos", Other "(", Num "2", Other "*", Name "x", Other "+", Name "y", Other ")"] : token list; John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
In tro duction to F unctional Programming: Lecture 10 12 De�nition of the lexer val lex = let fun several p = many (some p) fun lowercase_letter s = "a" <= s andalso s <= "z" fun uppercase_letter s = "A" <= s andalso s <= "Z" fun letter s = lowercase_letter s orelse uppercase_letter s fun alpha s = letter s orelse s = "_" orelse s = "'" fun digit s = "0" <= s andalso s <= "9" fun alphanum s = alpha s orelse digit s fun space s = s = " " orelse s = "\n" orelse s = "\t" fun collect(h,t) = h^(itlist (fn s1 => fn s2 => s1^s2) t "") val rawname = some alpha ++ several alphanum >> (Name o collect) val rawnumeral = some digit ++ several digit >> (Num o collect) val rawother = some (K true) >> Other val token = (rawname || rawnumeral || rawother) ++ several space >> fst val tokens = (several space ++ many token) >> snd val alltokens = (tokens ++ finished) >> fst in fst o alltokens o explode end; John Harrison Univ ersit y of Cam bridge, 5 F ebruary 1998
Recommend
More recommend