This is the third part of parsing in Haskell. The previous posts are available at Haskell Parsers Part 1 and Haskell Parsers Part 2. In part one we defined a parser as a type and derived a very simple parser, a parser of char. Part two extended the ideas and created monadic, functorial and applicative instances of the parser type. This allowed us to combine parsers monadically – using ‘do’ or in an applicative style – e.g. using ‘<*>’ etc. Which to use is often a stylistic choice.
In this post we’ll look at creating a few extra functions to allow more varied combinations of parsers and we can then use them in a more practical setting which I’ll describe in the final post. Here is the ‘finished’ module of parser functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
{-# OPTIONS -Wall -fwarn-tabs -fno-warn-type-defaults -fno-warn-unused-do-bind #-} {-# LANGUAGE OverloadedStrings #-} module Parser where import Data.Char import Control.Applicative hiding (many) newtype Parser a = P (String -> [(a,String)]) parse :: Parser a -> String -> [(a,String)] parse (P p) = p ch :: Parser Char ch = P (\s -> case s of [] -> [] (x:xs) -> [(x, xs)]) -- empty list denotes failure, -- and a singleton list denotes success -- That is, fmap applies a function to the -- result value of a parser if the parser succeeds, -- and propagates the failure otherwise. instance Functor Parser where -- fmap :: (a -> b) -> Parser a -> Parser b fmap f p = P (\s -> case parse p s of [] -> [] [(x, xs)] -> [(f x, xs)]) instance Applicative Parser where -- pure :: a -> Parser a pure x = P (\s -> [(x, s)]) -- (<*>) :: Parser (a -> b) -> Parser a -> Parser b pab <*> pa = P (\s -> case parse pab s of [] -> [] [(fab, res)] -> parse (fmap fab pa) res) instance Monad Parser where -- return :: a -> Parser a return x = P (\s -> [(x, s)]) -- (>>=) :: Parser a -> (a -> Parser b) -> Parser b pa >>= f = P (\s -> case parse pa s of [] -> [] [(a, res)] -> parse (f a) res) instance Alternative Parser where -- empty :: Parser a empty = P (\_ -> []) -- (<|>) :: Parser a-> Parser a -> Parser a -- if p1 works the that one otherwise p2 p1 <|> p2 = P (\s -> case parse p1 s of [] -> parse p2 s res -> res) -- First of all, we define a parser satisfy p for -- single characters that satisfyisfy the predicate p satisfy :: (Char -> Bool) -> Parser Char satisfy p = do x <- ch if p x then return x else P (\_ -> []) satisfy' :: (Char -> Bool) -> Parser Char satisfy' p = ch >>= (\x -> if p x then return x else P (\_ -> [])) -- parse a given char char :: Char -> Parser Char char x = satisfy (==x) -- couple of predicates isVowel :: Char -> Bool isVowel c = elem (toLower c) ['a','e', 'i', 'o', 'u'] isConsonant :: Char -> Bool isConsonant = not . isVowel digit :: Parser Char digit = satisfy isDigit lowerCase :: Parser Char lowerCase = satisfy isLower upperCase :: Parser Char upperCase = satisfy isUpper alpha :: Parser Char alpha = satisfy isAlpha alphaNum :: Parser Char alphaNum = satisfy isAlphaNum vowel :: Parser Char vowel = satisfy isVowel consonant :: Parser Char consonant = satisfy isConsonant string :: String -> Parser String string [] = return [] string s@(x:xs) = do char x string xs return s string' :: String -> Parser String string' [] = return [] string' s@(x:xs) = char x *> string' xs *> return s httpVerb :: Parser String httpVerb = string "GET" <|> string "POST" <|> string "DELETE" httpPreamble :: Parser String httpPreamble = do verb <- httpVerb char ',' return verb -- or httpPreamble' :: Parser String httpPreamble' = httpVerb <* char ',' many :: Parser a -> Parser [a] many p = many1 p <|> pure [] many1 :: Parser a -> Parser [a] many1 p = pure (:) <*> p <*> many p ident :: Parser String ident = do x <- lowerCase xs <- many alphaNum return (x:xs) ident' :: Parser String ident' = pure (:) <*> lowerCase <*> many alphaNum natural :: Parser Int natural = do xs <- many1 digit return (read xs) natural' :: Parser Int natural' = pure read <*> many1 digit space :: Parser () space = do many (satisfy isSpace ) return () space' :: Parser () space' = many (satisfy isSpace ) *> return () int :: Parser Int int = do char '-' n <- natural return (-n) <|> natural int' :: Parser Int int' = char '-' *> pure ((-1)*) <*> natural <|> natural dropSpaces :: Parser a -> Parser a dropSpaces p = do space v <- p space return v dropSpaces' :: Parser a -> Parser a dropSpaces' p = space *> p <* space identifier :: Parser String identifier = dropSpaces ident integer :: Parser Int integer = dropSpaces int literal :: String -> Parser String literal xs = dropSpaces (string xs) anyString' :: Parser String anyString' = do x <- alphaNum xs <- many alphaNum return (x:xs) anyString :: Parser String anyString = dropSpaces anyString' comment :: String -> Parser () comment cmnt = do string cmnt many (satisfy ( /='\n') ) return () comment' :: String -> Parser () comment' cmnt = string cmnt *> many (satisfy (/='\n') ) *> return () blockComment :: String -> String -> Parser () blockComment stS endS= do string stS manyUntil ch (string endS) return () blockComment' :: String -> String -> Parser () blockComment' stS endS = string stS >> manyUntil ch (string endS) >> return () manyUntil :: Parser a -> Parser b -> Parser [a] manyUntil p endp = go where go = do endp return [] <|> do x <- p xs <- go return (x:xs) manyUntil' :: Parser a -> Parser b -> Parser [a] manyUntil' p endp = scan' where scan' = endp *> return [] <|> pure (:) <*> p <*> scan' |
Many of the details in the above Haskell file have been discussed but quite a few haven’t!
The ‘many‘ and ‘many1‘ parsers are particularly interesting. They’re like the ‘push-me-pull-you’ of parsers and are mutually recursive. The ‘many‘ tries to run a parser zero or more times and ‘many1‘ tries to run the parser at least once.
1 2 3 4 5 |
many :: Parser a -> Parser [a] many p = many1 p <|> pure [] many1 :: Parser a -> Parser [a] many1 p = pure (:) <*> p <*> many p |
If we first try ‘many‘ it first tries ‘many1‘ and if that fails it puts [] in context and that is the result. When ‘many1‘ is called it ‘loads up’ list concatenation operator and applies the parser so we the applied parser with a partially applied ‘:’ function. Then ‘many‘ is called and that way they go back and forth until ‘many‘ fails, pure [] is run and the list gets resolved. i.e. Parser [a]
Our first use of ‘many‘/’many1‘ is parsing for an identifier where an identifier is, arbitrarily, defined as starting with a lowercase letter then followed by any number of letters/digits.
1 2 3 4 5 |
ident :: Parser String ident = do x <- lowerCase xs <- many alphaNum return (x:xs) |
and then there is the corresponding applicative style:
1 2 |
ident' :: Parser String ident' = pure (:) <*> lowerCase <*> many alphaNum |
Next we define a space parser that just consumes leading spaces from the input string. Here it is in monadic and applicative style.
1 2 3 4 5 6 7 |
space :: Parser () space = do many (satisfy isSpace ) return () space' :: Parser () space' = many (satisfy isSpace ) *> return () |
and in action
1 2 |
λ-> parse space " a1BC " [((),"a1BC ")] |
Notice that the leading spaces in ” a1BC ” are consumed and the trailing spaces are not as the parser stops, as it should, when getting to a non-space character.
To handle leading and trailing spaces for any parser we define ‘dropSpaces‘
1 2 3 4 5 6 7 8 9 |
dropSpaces :: Parser a -> Parser a dropSpaces p = do space v <- p space return v dropSpaces' :: Parser a -> Parser a dropSpaces' p = space *> p <* space |
as can be seen ‘dropSpaces‘ parses spaces then runs the supplied parser and then runs the space parser again. And in ghci…
1 2 |
λ-> parse ( dropSpaces (many1 alphaNum )) " a1BC " [("a1BC","")] |
Finally the ‘manyUntil‘ is a slightly involved parser…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
manyUntil :: Parser a -> Parser b -> Parser [a] manyUntil p endp = go where go = do endp return [] <|> do x <- p xs <- go return (x:xs) manyUntil' :: Parser a -> Parser b -> Parser [a] manyUntil' p endp = scan' where scan' = endp *> return [] <|> pure (:) <*> p <*> scan' |
It runs a parser ‘p‘ until a ‘terminating’ parser ‘endp‘ succeeds. i.e ‘keep parsing for something until you’re able to parse something else’. Internally it uses the recursive ‘go‘ function to alternate between trying the ‘termination’ parser ‘endp‘ and building up the result by running the ‘p‘ parser and calling ‘go‘ again.
Having ‘manyUntil‘ will now allow us to create parsers for block comments.
1 2 3 4 5 6 7 8 |
blockComment :: String -> String -> Parser () blockComment stS endS= do string stS manyUntil ch (string endS) return () blockComment' :: String -> String -> Parser () blockComment' stS endS = string stS >> manyUntil ch (string endS) >> return () |
and the comment is delimited by the start and end strings supplied to the parser.
Well, thanks for getting this far! In the next post we’ll look at using some of what’s been written and parsing strings into data and actually doing something with that data.
Thank you for detail and step-by-step tutorials about parser!
Will it come 4-th part?
I’ve done nothing yet for part 4. Is there a particular topic you’d like to see?
Maybe, profunctors.