Archive | Parsing RSS for this section

Aeson Revisited

As many of you know, the documentation situation in Haskell leaves something to be desired. Sure, if you are enlightened, and can read the types, you’re supposedly good. Personally, I prefer a little more documentation than “clearly this type is a monoid in the category of endofunctors”, but them’s the breaks.

Long ago, I wrote about some tricks I found out about using Aeson, and I found myself using Aeson again today, and I’d like to revisit one of my suggestions.

Types With Multiple Constructors

Last time we were here, I wrote about parsing JSON objects into Haskell types with multiple constructors. I proposed a solution that works fine for plain enumerations, but not types with fields.

Today I had parse some JSON into the following type:

data Term b a = App b [Term b a] | Var VarId | UVar a

I thought “I’ve done something like this before!” and pulled up my notes. They weren’t terribly helpful. So I delved into the haddocs for Aeson, and noticed that Aeson's Result type is an instance of MonadPlus. Could I use mplus to try all three different constructors, and take whichever one works?

instance (FromJSON b, FromJSON a) => FromJSON (Term b a) where parseJSON (Object v) = parseVar `mplus` parseUVar `mplus` parseApp where parseApp = do ident <- v .: "id" terms <- v .: "terms" return $ App ident terms parseVar = Var <$> v .: "var" parseUVar = UVar <$> v .: "uvar" parseJSON _ = mzero

It turns out that I can.

Particularly Vexing Exceptions

If you’re a developer and you’ve been around the internet for a while, you may have read Eric Lippert’s Vexing Exceptions. If you haven’t, you should go do that, it’s a good read.

Back? Ok then. In the article, the author talks about a situation with C#’s Int32.Parse method. Apparently in C#, the method to parse a string to an int throws an exception upon failure. Because this was so ‘vexing’, the language team later implemented Int32.TryParse that returns a boolean upon success or failure, and has an out parameter to return the actual parsed int.

Personally, given those two options, I prefer the original parse function that can throw. I don’t like out parameters, and checking the boolean is just as irritating as catching a parse exception. However, C# is a language without the Maybe monad.

Unfortunately for us, Haskell’s read function also throws an exception instead of returning Maybe a. Unlike the C# example, this is actually vexing because in Haskell, you can only catch exceptions in the IO monad. So what to do?

After some searching online, I found the answer. There is a function in Text.Read:

readMaybe :: Read a => String -> Maybe a

As it’s name would imply, readMaybe attempts to parse a string into some other type, and wraps it in a Just upon success, or returning Nothing.

Things That Would Have Been Nice To Know: Happstack Requests

Stop me if you’ve heard this one before; after trying and failing to find an existing way to some task, a programmer gives up and creates their own method. No sooner than they’ve finished, they find the existing method. This happened to me.

Flash back a week. I’m sitting here working on my on-again-off-again project for a web app to access my home server. I’m trying to implement an image gallery application, and I need a method to read the URI and return images.

Happstack, my framework of choice, provides two methods for routing. There is some template-haskell magic that I don’t care to deal with, and there is string routing combinators. These combinators look like this:

routeFn = dir "foo" $ dir "bar" $ fooBarRouteFn

Basically, routeFn tests to see if the requested path is /foo/bar, and if so, calls the fooBarRouteFn route function. This is all fine and good for simple static routes, but I want to be able to request an arbitrary path on my server, and this solution won’t work. Looking through the Happstack documentation, I find a function called uriRest that “grab[s] the rest of the URL (dirs + query) and passes it to your handler”. Using this, if I request:

http://localhost:8000/foo/bar?foo=bar

… and I call uriRest, I’ll get:

"foo/bar?foo=bar"

I can then use Parsec to parse that into a usable data structure. Less than ideal, but something I can work with. So, work with it I did, implementing a parser to turn that into:

data URITail = URITail {getRouteDirs :: [String], getQueries :: Maybe [(String, Maybe String)] } deriving (Show)

This worked, though I wouldn’t call it “elegant”. I couldn’t shake the feeling I was doing it wrong. It turns out that I was.

Just Use askRq, Dummy!

It turns out that in any ServerPart, you can call askRq. This returns the Request. Request has a bunch of useful functions, here are a few of particular interest:

rqPaths

rqPaths returns a list of path segments. Exactly what would have been in my URITail‘s getRouteDirs field.

rqInputsQuery

rqInputsQuery returns a list of key-value pairs for each part of the QUERY_STRING. This is exactly what my URITail‘s getQueries field was for. Even better is the fact that this function handles all sorts of encoding issues that I, a web developer with hours of experience, wasn’t even aware of!

and others

Aside from these big two, there are functions to get the request method, get the requesting host, and all sorts of potentially useful information.

Don’t I Feel Silly?

Yeah, but it wouldn’t be the first time. I was pretty proud of my URITail module too, unfortunately it’ll have to languish in the dust bin, serving only as a learning experience. Luckily for you, you can learn from my mistakes.

If you want to route in a dynamic manner in Happstack, feel free to call askRq, and examine the request. Don’t re-invent the wheel.

An Intro To Parsec

Taking a break from the Photo Booth, I’ve been working on a re-write of dmp_helper. If you’ve been following me for a while, you may remember that as a quick, dirty hack I did in perl one day to avoid manually writing out HTML snippets. While it mostly works, it’s objectively terrible. I guess if my goal is to never get a job doing Perl (probably not the worst goal to have) then it might serve me well.

But I digress. DmpHelper, the successor to dmp_helper, is going to be written in Haskell, and make extensive use of the Parsec parser library. The old dmp_helper makes extensive use of regular expressions to convert my custom markup to HTML, but I’d prefer to avoid them here. I find regular expressions to be hard to use and hard to read.

So I set off to work. And not 3 lines into main :: IO (), I came across a need to parse something! This is something that has been parsed many times before, but it’s good practice so it’s a wheel I’ll be re-inventing. I’m talking of course about parsing the ArgV!

Some Choices To Make

But before I get to that, I need to make some choices about how my arguments will look. I’ve decided to use the “double dash” method. Therefore, all flags will take the form of --X, where X is the flag. Any flag can take an optional argument, such as --i foo, all text following a flag is considered to belong the the preceeding flag until another flag is encountered. Some flags are optional, and some are mandatory. The order flags appear is not important.

So far, I have 3 flags implemented: --i [INPUT_FILE], the file to be converted, --o [OUTPUT_FILE], the location to save the resulting HTML, and --v, which toggles on verbose mode. Other flags may be added in as development progresses.

Data ArgV

Next, I’ll create a type for my ArgV.

data ArgV = ArgV {inputFile :: FilePath, outputFile :: FilePath, isVerbose :: Bool} deriving (Show)

As you can see, my type has three fields, which line up with my requirements above.

Introducing Parsec

Parsec is a fairly straight-forward library to use. It has a lot of operators and functions, whcih require some thought. However they all amount to basic functions that do a specific parsing task. Your job is to tell these functions what to do. So let’s get right down to it.

parseArgV :: [String] -> Either ParseError ArgV parseArgV i = parse pArgV "" $ foldArgV i foldArgV :: [String] -> String foldArgV i = foldl' (++) [] i

Here we have two functions to get us started. The function foldArgV collapses the list of strings into a single string, to be parsed. Because of the way that arguments are parsed by the operating system, the argument string --i foo --o bar --v will be collapsed to --ifoo--obar--v. This is good for us because this means we don’t have to worry about parsing white space.

The second function, parseArgV is our entry point into Parsec. All of Parsec’s combinator functions return a Parser monad. The function parse parses the data in the parser monad and returns either an error, or a whatever. In our case, it’s parsing to an ArgV.

Parse takes 3 arguments: a parser, a “user state”, and a string to parse. The first and third are pretty self explanatory, but the second is a mystery. Unfortunately I don’t have an answer for you as to it’s nature. I spent a good hour or two researching it yesterday, and the best I could come up with is “don’t worry about it”. None of the tutorials use it, and the documentation basically just says “this is state. Feel free to pass an empty string, it’ll be fine”. I’ve decided I won’t worry my little head about it for the time being.

pArgV :: CharParser st ArgV pArgV = do permute $ ArgV <$$> pInputFile <||> pOutputFile <|?> (False, pIsVerbose)

This function is the true heart of my parser. I’d like to draw your attention to the invocation of permute. This function implements the requirement that the order of arguments shouldn’t matter. Ordinarily, parsec will linearly go through some input and test it. But what do you do if you need to parse something that doesn’t have a set order? The library Text.Parsec.Perm solves this problem for us.

The function permute has a difficult type signature to follow, so a plain-english explanation is in order. Permute is used in a manner that resembles an applicative functor. Permute takes a function, and then using it’s operators, it takes functions that return a parser for the type of each field in the function in a pseudo-applicative style. These functions don’t need to come in the order that they’re parsed in, but in the order that the initial function expects them to be. Permute will magically parse them in the correct order and return a parser for your type. Let’s break this down line-by-line:

... permute $ ArgV <$$> pInputFile ...

Here we call permute on the constructor for ArgV. We feed it the first function, pInputFile using the <$$> operator. This operator is superficially, and logically similar to the applicative functor’s <$> operator. It creates a new permutation parser for the type on the left using the function on the right.

... <||> pOutputFile ...

Here we feed the second parser function, pOutputFile to the permutation parser using the <||> operator. This operator is superficially, and logically similar to applicative’s <*> operator.

... <|?> (False, pIsVerbose) ...

Finally, we feed it a parser for isVerbose. The operator <|?> works the same as the <||> operator, except that this operator is allowed to fail. If <||> fails, the whole parser fails. If <|?> fails, then the default value (False, in this case) is used. This allows us to have optional parameters.

Moving along from here, here are some extra parsing functions we need.

pInputFile :: CharParser st FilePath pInputFile = do try $ string "--i" manyTill anyChar pEndOfArg

This parser parses the --i parameter. The first line of the do expression parses the input to see if it is --i. Because it is wrapped in a call to try, it will only consume input if it succeeds. Since we don’t actually need this text, we don’t bind it to a name. If the parser had failed, it would exit the function and not evaluated the second line. On the second line, we take all the text until pEndOfArg succeeds. This text is what is returned from the function.

pEndOfArg :: CharParser st () pEndOfArg = nextArg <|> eof where nextArg = do lookAhead $ try $ string "--" return ()

This function detects if the parser is at the end of an argument. Notice the <|> operator. This is the choice operator. Basically, if the parser on the left fails, it runs the parser on the right. You can think of it like a boolean OR.

The parser eof is provided by parsec, and succeeds if there is no more input. The function lookAhead is the inverse of try. It only consumes input if the parser fails. By wrapping a try inside of a lookAhead, we create a parser that never consumes input.

The parser functions pOutputFile and pIsVerbose are almost identical to pInputFile, so I’m not going to bother typeing them out here.

Additional Reading

That’s about all there is to my ArgV parser. You can find some additional reading on the subject in Real World Haskell, whcih has an entire chapter dedicated to this subject. Also, check out the Parsec haddoc page on hackage.

%d bloggers like this: