Asynchronous episode 3 – Adventures in event-land

From callbacks to events

This is my third post on asynchronous Javascript. In the first one, I explored callbacks and I explained how I ended up creating streamline.js, a tool that simplifies programming in callback-land. In the second one, I described how I introduced futures in streamline.js to enable and control parallel execution of asynchronous I/O operations. In this third one, I am going to talk about the other side of node’s asynchronous APIs: events.

But before that, I’ll quickly explain the process that drove me through these investigations: I did not build streamline.js just for the fun of it; I built it because I saw a big potential in node.js and I had a real project that I wanted to build on top of it. But I quickly realized that callbacks would be a showstopper for us. We are building business applications and we have to give our developers a simple and friendly programming environment that lets them concentrate on their problem domain. It does not make sense for us to try to turn our developers into callback ninjas. Also, readability is critical for us: business rules must be expressed in natural and understandable manner. Callbacks just add too much noise, even with libraries to ease the pain. So node.js would have been disqualified if we hadn’t found a way to drastically reduce the overhead introduced by callbacks.

Once I had the tool working, I started to convert several thousands of lines that we had written in callback style before. This went very smoothly actually and I am very pleased with the result: the code is leaner, simpler, much easier to write and, more important, much much easier to read, understand and maintain. Looks like node.js has now become a viable platform for us.

But the conversion left us with some modules that were written in event style rather than callback style, mostly middleware modules and specialized communication clients (HTTP and TCP). So I started to investigate these other modules. We had drastically improved the situation in callback-land. Maybe there was room for improvement in event-land too!

When events aren’t so rosy

When I first looked at node.js I thought that there was some beauty in all these observers and emitters. But when I started to investigate our event-style modules, which had been written by a brilliant Javascript programmer of our team, I started to cringe a bit. Of course, the code was correct and all the events that were emitted were getting handled properly by some observer that had been registered beforehand. But the logic was often hard to follow because the flow was jumping abruptly from one place to another instead of following natural call chains. With callbacks, I often had the feeling that the flow had been wired with GOSUB directives. Here, in event-land, it felt more like it had been wired with setjmp/longjmp calls. Node.js ninjas probably get a kick out of this, but is this really a viable approach for large industrial projects?

And then we ran into an interesting bug. We have an authentication module between our main module and the lower-level modules to which the requests are routed. We had chosen to let the lower-level modules register the data and end observers on requests so that they could stream posted data when necessary. It was all working fine, until we connected our authentication module to the database. Suddenly, our data and end event handlers were not getting called any more!

It did not take us long to figure out why: the authentication module had become asynchronous because of the database calls and consequently, the data and end events were being dequeued during authentication, before our lower level modules had got a chance to register their observers.

This raised a design issue: how should we fix this? One way would be to let the lower level modules invoke the authentication module after setting their handlers but then every low level module would be responsible for authentication and would have to hold its dispatch until it has received both the end event and the green light from the authentication module. This means a proliferation of non-obvious security sensitive code snippets. Not good! Another way to fix it would be to install the event handlers in the main module and let them buffer the data. Then we would probably need to introduce pause and resume calls to control the amount of buffering.

But to me, all this raised a red flag: this kind of event handling logic seems fundamentally fragile and we would need a great deal of discipline to write code that is robust and easy to maintain.

Funny enough, while I was writing this post, someone asked for help about request object losing events on the node.js forum. We are obviously not the only ones to have run into this problem.

Inverting the flow

So I started to question the beauty and relevance of all this. Is this event style somehow imposed upon us by the asynchronous nature of node.js? Or is it just an arbitrary API choice that shadows other possible designs?

What about trying to invert the flow around node’s stream objects? Instead of letting the stream push the data to its consumer through events, couldn’t we set up the API so that the consumer pulls the data from the stream by calling an asynchronous read method?

I gave it a shot and found out that this was actually fairly easy to do. The pattern is similar to the one I used for futures (and actually, I discovered the pattern when experimenting with streams, and applied it to futures afterwards). I packaged the solution as a streams module containing small wrappers around node streams, that I posted here.

To keep things simple, I chose to have the basic read method return the data one chunk at a time, and return null once all the chunks have been read and the end event has been received.

With this low level method, I could easily write a readAll method that reads the stream to the end. Here is the streamlined source of this method:

this.readAll = function(_) {
  var chunk, chunks = [];
  while (chunk = this.read(_))
    chunks.push(chunk);
  return concat(chunks);
}

where concat is an auxiliary function that deals with the fact that chunks could be buffers or strings.

I also added lowMark and highMark options to the stream constructor so that you can control how much data will be buffered without having to fool around with pause and resume calls.

Pull-style streams in action

This streams module is still under development. I’ve only published wrappers for HTTP client and server requests at this time and I haven’t written extensive unit tests yet. But I published a small example that calls Google’s search web service. Here is a condensed version of this little Google client:

function google(str, _) {
  var json = streams.httpRequest({
    url: 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=' + str,
    proxy: process.env.http_proxy
  }).end().response(_).checkStatus(200).readAll(_);

  return JSON.parse(json).responseData.results.map(function(entry) {
    return entry.url + '\n\t' + entry.titleNoFormatting;
  }).join('\n');
}

This example demonstrates some of the benefits that we get by inverting the flow:

  • The code is smaller
  • Calls can be chained in a natural way (and even more so with streamline.js which prevents the callbacks from disrupting the chains).
  • The code is more robust: there is no risk of losing events because listeners were not set up early enough; exceptions are naturally propagated along call chains; pause/resume calls are hidden and buffering is controlled declaratively, etc.
  • The code should look familiar to people who have been programming with traditional I/O APIs (read functions).

Wrapping up

I don’t know how the idea of inverting the flow will be received (it may not actually be completely new either) but I anticipate that some node.js aficionados will treat it as heresy. Aren’t callbacks and events the very essence of node? Who dares promote alternate programming styles, first without callbacks (actually not quite), then without events?

To me, the essence of node.js is the combination of an incredible Javascript engine and a simple, fundamentally asynchronous runtime, which treats HTTP as a first class protocol. APIs are just enablers for innovations on top of this foundation.

What I’m looking for are tools and APIs that will allow us to deliver great products, and I feel that they are starting to shape up. The streams module is just one more piece of the puzzle. It blends very well with streamline.js but the transformed source can also be used as a normal standalone node.js module with a callback oriented API.

About these ads
This entry was posted in Asynchronous JavaScript, Uncategorized. Bookmark the permalink.

3 Responses to Asynchronous episode 3 – Adventures in event-land

  1. Exciting stuff! Another great and elegant idea. I look forward to trying it out when I get a chance (or have a need). Keep up the great work!

  2. Pingback: Asynchronous episode 3 – Adventures in event-land « async I/O News

  3. Pingback: Node.js stream API: events or callbacks? | Bruno's Ramblings

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s