parsing
play

Parsing package docs: Part III: Using the ReadP package - PowerPoint PPT Presentation

On to ReadP ReadP A small, but fairly complete parsing package (shipped with GHC) Parsing package docs: Part III: Using the ReadP package http://hackage.haskell.org/package/base-4.12.0.0/docs/ Text-ParserCombinators-ReadP.html


  1. On to ReadP • ReadP • A small, but fairly complete parsing package (shipped with GHC) Parsing • package docs: Part III: Using the ReadP package http://hackage.haskell.org/package/base-4.12.0.0/docs/ Text-ParserCombinators-ReadP.html • Parsec • A bigger more complete parsing package Jim Royer • Unlike ReadP, it can handle errors in an OK fashion. April 9, 2019 • package docs: http://hackage.haskell.org/package/parsec CIS 352 • The Parsec page on the Haskell Wiki: https://wiki.haskell.org/Parsec 1/22 2/22 Primitives Repeated from Hutton’s Parser.hs First Examples getLetter, openClose :: Parser Char • get :: ReadP Char getLetter = satisfy isLetter Consumes and returns the next character. Fails on an empty input. • getLetter • (<++) :: ReadP a -> ReadP a -> ReadP a openClose = do { char ’(’ parses the language ( +++ means something else in ReadP.) Equivalent to Hutton’s +++ . ; char ’)’ { a , b , . . . , z , A , B , . . . , Z } . } • pfail :: ReadP a Equivalent to Hutton’s fail . • openClose anbn :: Parser () • satisfy :: (Char -> Bool) -> ReadP Char parses the language { () } . anbn = do { char ’a’ Equivalent to Hutton’s sat . • anbn ; anbn • char :: Char -> ReadP Char parses the language { a n b n n ≥ 0 } ? ; char ’b’ Same as in Hutton’s (Actually, there are problems.) ; return () • string :: String -> ReadP String } Same as in Hutton’s <++ return () 3/22 4/22

  2. Digression: Running your parser Two Handy Definitions • readP to S :: ReadP a -> String -> [(a,String)] (readP to S p str) runs parser p on str and returns the results. parse :: ReadP a -> String -> [(a,String)] samples.hs parse = readP_to_S After loading samples.hs . . . *Main> readP to S openClose "()" parseWith :: ReadP a -> String -> a sample, openClose :: ReadP Char [(’)’,"")] sample = satify isLetter parseWith p s = case [a | (a,t) <- parse p s, all isSpace t] of *Main> readP to S openClose "(]" openClose [] [a] -> a . = do { char ’(’ ; char ’)’ } . [] -> error "no parse" . . . _ -> error "ambiguous parse" . In our parser files, we’ll usually introduce the alias parse = readP to S 5/22 6/22 ReadP’s (+++) (+++) versus (<++) • (+++) :: ReadP a -> ReadP a -> ReadP a (p1 +++ p2) runs parses p1 and p2 “in parallel” and returns the list of results. When we mix (+++) and recursion, things get interesting. (Not the same as Hutton’s (+++) !) as1, as2 :: ReadP String After loading samples.hs Recall that (p1 <++ p2) trys p1 , and if that fails, trys p2 . as1 = do { c <- char ’a’ *Main> parse as1 "aaaxxx" Examples ; cs <- as1 [("","aaaxxx"), *Main> parse (string "ask" +++ string "as") "ask him" ; return (c:cs) ("a","aaxxx"), } [("as","k him"),("ask"," him")] ("aa","axxx"), +++ return "" ("aaa","xxx")] *Main> parse (string "ask" <++ string "as") "ask him" [("ask"," him")] as2 = same as as1 but with <++ . *Main> parse as2 "aaaxxx" . . [("aaa","xxx")] . *Main> parse (string "as" <++ string "ask") "ask him" [("as","k him")] 7/22 8/22

  3. Primitives beyond Hutton’s, munch, munch1 Parsing Primitives beyond Hutton’s, munch, munch1 • many :: (ReadP a) -> (ReadP [a]) Parses zero or more occurrences of the given parser 2019-04-09 • many1 :: (ReadP a) -> (ReadP [a]) Parses one or more occurrences of the given parser • munch, munch1 :: (Char -> Bool) -> ReadP String (munch tst) is a greedy variant of (many (satisfy tst)) . For example: • many :: (ReadP a) -> (ReadP [a]) > parse (many (char ’a’)) "aaaa" [("","aaaa"), ("a","aaa"), ("aa","aa"), Primitives beyond Hutton’s, munch, munch1 ("aaa","a"), ("aaaa","")] > parse (munch (==’a’)) "aaaa" [("aaaa","")] Parses zero or more occurrences of the given parser • many1 :: (ReadP a) -> (ReadP [a]) • Greedy ≈ parses as much of the string as possible. Parses one or more occurrences of the given parser • munch and munch1 use (<++) . • munch, munch1 :: (Char -> Bool) -> ReadP String • many and many1 use (+++) . (munch tst) is a greedy variant of (many (satisfy tst)) . For example: > parse (many (char ’a’)) "aaaa" [("","aaaa"), ("a","aaa"), ("aa","aa"), ("aaa","a"), ("aaaa","")] > parse (munch (==’a’)) "aaaa" [("aaaa","")] 9/22 Adding Semantics, An Example A Few Combinators, 1 nesting :: Parser Int Things to look up in the ReadP docs: nesting = do { char ’(’ ; n <- nesting • skipMany (and friends) ; char ’)’ • between ; m <- nesting • sepBy (and friends) ; return (max (n+1) m) } • endBy (and friends) +++ return 0 URL: https://hackage.haskell.org/package/base-4.11.0.0/docs/ Text-ParserCombinators-ReadP.html [Try (parse nesting "(())") , (parse nesting "()((()())())") , etc.] 10/22 11/22

  4. A Few Combinators, 2 Simple sentence parsing Simple sentence parsing (continued) word :: ReadP String word = munch1 isLetter sentence :: ReadP [String] sentence oneOf :: [Char] -> ReadP Char = do { words <- sepBy1 oneOf cs word = choice [char c | c <- cs] separator Parsing CSV Files ; oneOf ".?!" separator :: ReadP () ; return words separator } = skipMany1 (oneOf " ,") *Main> parse sentence "traffic lights are red, blue, and green." ["traffic","lights","are","red","blue","and","green"] 12/22 A CSV parser (from Real World Haskell ) A Grammar for CSV CSV: Comma-separated values � file � :: = � line � ∗ A simple file format used by spreadsheets and databases. � line � :: = (( � cell � , ) ∗ � cell � ) ? � newline � See: http://en.wikipedia.org/wiki/Comma-separated_values � cell � :: = � character � + | � quotedCell � A sample � quotedCell � :: = ” � quotedChar � ∗ ” Year , Make , Model , Description , Price 1997 , Ford , E350 , "ac, abs, moon" , 3000.00 � quotedChar � :: = � notQuote � | ”” 1999 , Chevy , "Venture ""Extended Edition""" , "" , 4900.00 1999 , Chevy , "Venture ""Extended Edition, Very Large""","" , 5000.00 � notQuote � :: = everything but ” 1996 , Jeep , Grand Cherokee , "MUST SELL! � newline � :: = \ n \ r | \ r \ n | \ n | \ r air, moon roof, loaded" , 4799.00 � character � :: = a | b | . . . • Commas separate “cells”. Note: A ? • Unquoted commas are in red. ≡ A | ǫ ≡ 0 or 1 copies of A • Inside quoted text "" is a quoted quote. [Stage direction: Copy the grammar to the board.] • Lines normally end with a newline, but quoted text can cross line boundries. 13/22 14/22

  5. A parser for CSV, 1 A parser for CSV, 2 � file � :: = � line � ∗ � newline � :: = \ n \ r | \ r \ n | \ n | \ r � cell � :: = � character � + | � quotedCell � � character � :: = a | b | . . . csvFile :: ReadP [[String]] line :: ReadP [String] csvFile = endBy line eol line = sepBy cell (char ’,’) eol :: ReadP String cell :: ReadP String eol = (string " \ n \ r") cell = quotedCell <++ (string " \ r \ n") <++ munch (‘notElem‘ ", \ n \ r") <++ (string " \ n") <++ (string " \ r") 15/22 16/22 A parser for CSV, 3 A parser for CSV, 4 All on one page cell :: ReadP String � quotedCell � :: = ” � quotedChar � ∗ ” cell = csvFile :: ReadP [[String]] � quotedChar � :: = � notQuote � | ”” quotedCell csvFile = endBy line eol <++ munch (‘notElem‘ ", \ n \ r") � notQuote � :: = everything but ” line :: ReadP [String] quotedCell :: ReadP String line = sepBy cell (char ’,’) quotedCell = between ( char ’"’) quotedCell :: ReadP String eol :: ReadP String ( char ’"’) quotedCell = between (char ’"’) eol = ( string " \ n \ r") ( many quotedChar) (char ’"’) <++ ( string " \ r \ n") quotedChar :: ReadP Char (many quotedChar) <++ ( string " \ n") quotedChar = <++ ( string " \ r") quotedChar :: ReadP Char satisfy (/= ’"’) quotedChar = +++ ( string " \ " \ "" >> return ’"’) satisfy (/= ’"’) +++ (string " \ " \ "" >> return ’"’) Parser combinators (other than <++ and +++ ) are in bold . 17/22 18/22

Recommend


More recommend