Khask

Thu 09 July 2015 | -- (permalink)

I just finished a new programming project, this time in Haskell. Khask is a HTTP file server with a GridFS backend.

It's actually my 3rd time writing this program. The first version was Khartoum (in Python). Then I started learning Go, and rewrote the app as Khargo. So when I decided to learn more Haskell, it seemed natural to try to write the same app again.

I still don't feel like I've mastered it. There are some things I'd like to refactor, but I get nasty monad stack errors when I attempt them. Still, Khask now has the following features:

  • Serving files from GridFS (duh).
  • Setting the appropriate Content-Type header.
  • Serving gzipped content to clients that support it.
  • Setting the ETag header to the md5 of the file's content.
  • CORS support. The Access-Control-Allow-Origin header is set to "*", so you can serve web fonts from khask without complaints from IE and Firefox.

Things I learned from this project:

The Haskell web programming ecosystem is not nearly as rich as Python or Go.

Though there's a MongoDB driver for Haskell, it doesn't include GridFS support like the Python and Go drivers do. There's a ticket to add that support, but it's been open for four years.

One kind soul even contributed an implementation as an attachment to that ticket, but it was written for a prior version of the MongoDB driver, which used a slightly different interface than the current version.

For Khask, I started with that contributed implementation and hacked until I could at least get it to compile, fixing some things and commenting others out entirely.

It would be good to get proper GridFS support into the Haskell driver, but I don't know whether the implementation I have in Khask should go upstream or not. It has a dependency on Conduit that I'm not sure the Mongo team would want to impose on their users. I'm interested in people's opinions on whether it would be better for the driver to stick with Conduit or switch to lazy bytestrings.

For now, I should probably just put my implementation into Hackage as a separate package. I don't know how to do that yet though.

The learning curve for web programming in Haskell is brutal.

To illustrate, here's the amount of time between the first commit of each of my versions of this program and the commit that contained the first working version:

Language Days
Python 1
Go 1
Haskell 133

These numbers aren't quite honest though. I had been programming professionally in Python for years before attempting to write this app. It was, however, one of my first projects to entirely eschew a web framework and just use raw WSGI.

The single day to write the Go implementation, on the other hand, is representative of how easy it is to get started in that language. (Though I did have the advantage of a very clear understanding of the app, given that I had implemented already in Python.) Everything came together very easily. Go's "io.Copy" function is a thing of beauty.

The 133 days to complete the Haskell implementation were not continuous. I created the repo near the end of January, hit some roadblock, and then sat it down until July 5th. So it's more like 6 days of actually doing any work on the project.

My usual practice is that if a side project takes longer than a couple evenings, or maybe a weekend, it gets shelved. The Haskell implementation of my GridFS server had actually been shelved a couple times before I gave it a name and created the repo this time.

People like frameworks

Many Haskell people will try to push you to use a framework, such as Yesod, Snap, or Scotty. I really didn't want that. A framework would be nice if I were just trying to throw up a quick website, but I already have Django for that. I want to learn how to build more meaningful applications. I want to know how to do my own plumbing.

Though Haskell learners and tutorials tend to fixate on monads, monad transformers are the steeper learning curve.

This may be the reason behind the previous observation. Monad transformers are the recommended mechanism for carrying around database connections, handling logging, etc., so they're pretty essential for writing real world web apps. But they tend to be treated as an advanced concept that's introduced late in any Haskell book (Real World Haskell has them in chapter 18. Learn You A Haskell doesn't cover them at all).

Web frameworks, particularly Yesod, seem to want to protect users from dealing with this by hiding the monad transformer complexity and just having the user write in a framework-specific DSL.

Because I'm really trying to grok Haskell itself, rather than a particular framework, I resisted the recommendations to adopt a framework. Also, since my app has no routing requirements (it's essentially one view that just passes the path straight through to GridFS), it seemed like a framework would probably just get in the way.

Currying to the rescue!

I still haven't conquered monad transformers, but I found an alternative way to pass my program-wide MongoDB connection into the per-request handler.

In Haskell, a WAI application has this type:

Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived

That means that it's a function that takes two arguments:

  1. A Request.
  2. A callback function that should be passed a Response.

And then it returns the result of calling the callback. That interface is extremely similar to the WSGI interface in Python, which takes an 'environ' argument (containing all the request data), and a 'start_response' callback.

In Python I had lots of ways to set up a database connection and re-use it between requests. (Usually I implement it as a callable class with the connection set up in init.) But Haskell is a lot more strict. How could I make an application that accepted not just a request and a callback, but also a DB connection?

I had spent a couple evenings banging my head against the ReaderT monad transformer before I realized that just a little currying could do the job. In Haskell, every function is really just a function of one argument, and returns a continuation that's a function that will accept the next argument, and so on.

So the answer was to just make a function with a signature like this:

Database -> Request -> (Response -> IO ResponseReceived) -> IO ResponseReceived

Then before passing the function to the web server to run my app, I just had to feed it the Database argument. That partially-applied function was then a valid WAI app, and had access to the database connection it needed.

Google is not always your friend.

Most of the blog posts about using WAI are broken. If you see a tutorial on Haskell web programming and it's 2 or more years old, it's unlikely to compile and run today. Ditto for any blog posts or example projects using MongoDB.

This can lead to an incredible amount of frustration for someone new to the language. Haskell is hard enough without having a bunch of promising tutorials result in dead ends.

It's easy to paint yourself into a corner.

Before actually making a web server, I first tried first building a little function that could pull a file from GridFS and print it to the command line. After getting that working, I then tried building up my web application from there. Then I got stuck. I had a mostly working version, but the code for fetching file metadata was too tightly connected to the code that streamed the payload to the client, making it impossible for me to detect when a file was missing and reply with a 404.

The community is helpful, except when it's not.

When I got stuck on that 404 problem, I posted a question about it to the Haskell Beginners group on Google Plus. No answer.

Then I saw a ask anything thread on /r/haskell, and tried my luck there. This time I got 1 response. It didn't answer my question, but it did point me towards doing a rewrite that ended up getting me unblocked.

WAI middlewares are very nice!

After I had Khask's basic functionality working, the next step was to add things like request logging, response gzipping, and CORS support. It turned out that each of those features was already available as a WAI middleware. Each was just a matter of adding the package to khask.cabal, importing it in Main.hs, and wrapping my application in it on the line where I call 'run'.

In theory, WSGI middlewares should be as useful in Python, but in practice, the Python ecosystem for web middlewares is divided into framework-specific fiefdoms. Django has its own middleware spec that is not compatible with WSGI. Pyramid has "hooks" and "tweens" that only work in Pyramid. Cherrypy has something called "tools" that fill this role.

A notable exception is Flask and its underlying Werkzeug library, which are WSGI all the way down (and a delight to use). WSGI middlewares work just fine with Werkzeug. But they don't seem as easy to find as WAI middlewares in Haskell-land.

So what next?

I'm going to try to work my way through Monad Transformers Step by Step, which was recommended to me on /r/haskell. I'm hoping to apply it to a simple web app as I work through. I'll try to blog about what I learn by applying it in a web context.

Why Haskell?

Haskell isn't easy. When doing something IO-intensive like a web app, the learning curve is undeniably steep, even setting aside the abundance of broken tutorials online. I keep asking myself why I'm spending spare time on it instead of another language.

My attraction to Haskell might not be rational, but I like how my Haskell code looks. There is extremely little boilerplate or punctuation demanded by the language. I like that I can write Haskell without needing a heavyweight IDE to deal with such cruft.

I also think that Haskell leads me to a more well-factored implementation than I would create in a more forgiving language like Python. Haskell makes it very hard to cheat.

When writing pure computation or data-transformation code, Haskell is truly beautiful.

As compared to Go, Haskell's (and Python's) expressiveness is just miles better. I am not convinced by the people saying that Go doesn't really need generics. And having seen how errors can be elegantly handled with Maybe and Either in Haskell, going back to Go's manual and error-prone error handling is extremely irritating.

I'm hoping that getting to grips with monad transformers will get me to a place where I'm reasonably productive doing web programming in Haskell. I expect that it will never be as fast to hack out a quick solution in Haskell as it is in Python, but I'm OK with that. I suspect that most of Python's advantage in development speed will be burnt up in the time it takes to turn that quick hack into well-tested, well-factored code that's equivalent in quality to the Haskell version.

We'll see.