Asynchronous JavaScript with Generators – An Experiment

I have recently added a third blade to my little asynchronous programming swiss army knife: the generator blade.

Looks sharp! Here are some details.

Generators

Generators are JavaScript’s flavor of coroutines. They solve one of computer science’s most important problems: generating the mysterious fibonacci numbers!

function genFibos() {  
  var f1 = 0, f2 = 1;  
  while (true) {  
    yield f1;  
    var t = f1;  
    f1 = f2;  
    f2 += t;  
  }  
}

function printFibos() {
	var g = genFibos();  
	for (var i = 0; i < 10; i++) {
	  var num = g.next();
	  print(num);  
	}
}

printFibos();

Generators are characterized by the presence of a yield keyword. Any function that contains a yield, like the genFibos function above, returns a generator.

Generators have a somewhat unusual execution pattern. I’ll quickly describe it on the example above.

The first oddity is that the genFibos() call inside printFibos does not execute the body of the genFibos function. Instead, it just returns a generator which is assigned to the g variable.

The first call to g.next() in the printFibos loop starts the generator: the body of the genFibos function executes until it reaches the yield keyword for the first time. At this point, control is transfered from genFibos to the point where g.next() was called and the num variable receives the value of f1 (1).

Then, execution continues in printFibos‘s for loop; i is incremented and g.next() is called a second time. This call transfers execution to the genFibos function again, at the point where we left it before, i.e. just after the first yield. The genFibos function loops and reaches yield a second time. As previously, control is transfered back to printFibos, and g.next() returns the value of f1 (2), which is assigned to num.

The for loop continues, hits another g.next() which transfers control to the yield point inside genFibos, which causes genFibos to loop again to the next yield, and so on, and so on.

What we have here is a little dance between printFibos and genFibos, with g.next() and yield f1 to jump from one to the other. Coroutines: functions that cooperate by yielding control to each other.

A bit disconcerting at first but it works, and it can help us deal with asynchronous code, as we’ll see shortly.

Generators in JavaScript

Generators are in the works for EcmaScript Harmony but they haven’t landed in V8 yet. So we cannot use them in node.js today.

On the other hand, they have been available for more than 5 years in Firefox (Firefox 2 and up). They are disabled by default but you can enable them by adding a version attribute to the <script> tag:

<script type="application/javascript;version=1.7">
function genFibos() { ... }
function printFibos() { ... }
printFibos();
</script>

And they are also available in luvmonkey, Tim Caswell’s port of libuv to spidermonkey. I’ve used both of these environments in my experimentations.

Streamline.js

Streamline.js is my little language tool for asynchronous JavaScript programming. It does not help much with fibonacci numbers but it does help a lot with code that sits on top of asynchronous APIs. No need to twist your brain any more; you can just write the code as if the APIs were synchronous, as long as you follow one simple rule:

Pass the _ marker everywhere a callback is expected.

Here is an example:

// waits one second (async/non blocking) and returns result
function delay(_, result) {
  setTimeout(_, 1000);
  return result;
}

function helloWorld(_) {
  print("starting ...");
  print(delay(_, 'hello ...'));
  print(delay(_, '... world'));
  print("done!");
}

helloWorld(function(err) {
  if (err) throw err;
  else print("continuing with callback");
});

It executes as follows:

_node helloWorld._js
starting ...
(wait one second)hello ...
(wait one second)... world
done!
continuing with callback

Streamline works by transforming the source, either to callback based code, or to code that uses Marcel Laverdet’s node-fibers library. You can see the callback transformation in action by cutting and pasting the example above in the online demo.

Yielding Hello World!

Supposedly, generators make it easier to write asynchronous code. And they do! Here is the generators version of my little hello world example:

function delay(_, result) {
  yield AsyncGen.invoke(this, setTimeout, [_, 1000], 0);
  yield result;
}

function helloWorld(_) {
  print("starting ...");
  print(yield delay(_, 'hello ...'));
  print(yield delay(_, '... world'));
  print("done!");
  yield;
}

function startDemo() {
  AsyncGen.run(helloWorld, [function(err) {
    if (err) alert(err);
    else print("continuing with callback");
  }], 0);
}

Quite nice! The general flow is similar to the synchronous flow. Instead of having to rewire everything with callbacks you just need to apply some rather simple rules:

First, keep the _ parameter (*) and add a yield keyword in front of all the calls that have this parameter.
Replace all return keywords by a yield.
Don’t forget to add a yield at the end of functions that don’t end with a return. This is because these functions do return at the end; it is just JavaScript that lets you omit the return when you don’t have a value to return.
Use the special AsyncGen.invoke function to call asynchronous APIs that expect a callback, like setTimeout.
Use the special AsyncGen.run function to call any of your generator functions as a node.js callback style function.

(*) You can rename the _ parameter though, as long as you do it consistently.

The other side of the mirror

Looks good. But how will it work? We now have generator functions that call other generator functions, and yield keywords all over the place in our asynchronous code. If you remember the little fibonacci dance that we saw earlier, you can imagine that we’ll need a dance partner to interact with these generator functions. And this time, the dance is going to be wild. So we’ll need a very strong partner on the other side to keep it going!

This strong partner is the the AsyncGen.run function, which gets a little help from AsyncGen.invoke. Here they come:

  var PENDING = {};

  window.AsyncGen = {
    run: function(fn, args, idx) {
a)    var cb = args[idx],
        g;

      function resume(err, val) {
        while (g) {
          try {
b)          val = err ? g.throw(err) : g.send(val);
            err = null;
            // if we get PENDING, the current call completed with a pending I/O
            // resume will be called again when the I/O completes. So just return here.
c)          if (val === PENDING) return;
            // if we get [PENDING, e, r], the current call invoked its callback synchronously
            // we just loop to send/throw what the callback gave us.
d)          if (val && val[0] === PENDING) {
              err = val[1];
              val = val[2];
            }
            // else, if g yielded a value which is not a generator, g is done. 
            // so we unwind it we send val to the parent generator (or through cb if we are at the top)
e)          else if (!isGenerator(val)) {
              g.close();
              g = g.prev;
            }
            // else, we got a new generator which means that g called another generator function
            // the new generator become current and we loop with g.send(undefined) (equiv to g.next()) 
            else {
f)            val.prev = g;
              g = val;
              val = undefined;
            }
          } catch (ex) {
            // the send/throw call failed.
            // we unwind the current generator and we rethrow into the parent generator (or through cb if at the top)
g)          g.close();
            g = g.prev;
            err = ex;
            val = undefined;
          }
        }
        // we have exhausted the stack of generators. 
        // return the result or error through the callback.
h)      cb(err, val);
      }
      // set resume as the new callback
i)    args[idx] = resume;
      // call fn to get the initial generator
j)    g = fn.apply(this, args);
      // start the resume loop
k)    resume();
    },

    invoke: function(that, fn, args, idx) {
      // Set things up so that call returns:
      // * PENDING if it completes with a pending I/O (and cb will be called later)
      // * [PENDING, e, r] if the callback is called synchronously.
      var result = PENDING,
        sync = true;
l)    var cb = args[idx];
      args[idx] = function(e, r) {
m)      if (sync) {
n)        result = [PENDING, e, r];
        } else {
o)        cb(e, r);
        }
      }
p)    fn.apply(that, args);
q)    sync = false;
      return result;
    }
  };

To demonstrate how it works, I’ll use an annotated version of the hello world program:

    function delay(_, result) {
1)    yield AsyncGen.invoke(this, setTimeout, [_, 1000], 0);
2)    yield result;
    }

    function helloWorld(_) {
3)    print("starting ...");
4)    print(yield delay(_, 'hello ...'));
5)    print(yield delay(_, '... world'));
      print("done!");
6)    yield;
    }

    function startDemo() {
7)    AsyncGen.run(helloWorld, [function(err) {
8)      if (err) alert(err);
        else print("continuing with callback");
      }], 0);
9)  }

Here is how execution unfolds (buckle up and be ready for a little rodeo!):

7) startDemo calls AsyncGen.run.
a) and i) run stores the callback in cb and replaces it by resume.
j) run calls helloWorld with resume as callback. Nothing gets executed in helloWorld but a helloWorld generator is returned and assigned to g.
k) run calls resume().
b) resume calls g.send(undefined), which is the same as calling g.next(). Execution jumps to 3) inside helloWorld.
3) helloWorld prints "starting...".
4) helloWorld calls delay. Nothing gets executed in delay but a delay generator is returned and yielded by helloWorld. This yield jumps back to b), where g.send returns the delay generator.
b) The delay generator is assigned to val and err is reset.
c) d) and e) The tests are false so these if branches are skipped.
f) The delay generator is chained with the helloWorld generator and is assigned to g. val is set to undefined.
b) resume calls g.send(undefined). Execution jumps to 1) inside delay.
1) delay calls AsyncGen.invoke to invoke setTimeout with resume as callback.
l) invoke remembers the callback in cb and replaces it by its own callback.
p) invoke calls setTimeout.
p) sync is set to false and invoke returns PENDING.
1) We are back inside delay, and PENDING is yielded to the run loop. Executions jumps to b) with PENDING as return value.
b) PENDING is assigned to val and err is reset.
c) The test is true and resume returns.
k) run returns.
9) startDemo returns.
Sleeping one second…
m) Awaken in setTimeout‘s callback. sync is false.
o) resume is called with both parameters undefined.
b) resume calls g.send(undefined). Execution jumps to 1) inside delay.
2) delay yields result, which is "hello ...". Execution jumps to b) with "hello ..." as returned value.
b) "hello ..." is assigned to val and err is reset.
c) and d) The tests are false.
e) The test is true. The delay generator is closed and popped from the chain. Now, g points to the helloWorld generator which is where we left it, at 4).
b) resume calls g.send("hello ..."). Execution jumps to 4) inside helloWorld.
4) helloWorld prints "hello ...".
5) The same dance happens again with the second delay call, up to c) where the current resume call returns and o), the end of the setTimeout callback.
Sleeping one second…
m) Awaken in setTimeout‘s callback. Same dance as before until "... world" is returned at 5).
5) helloWorld prints "... world" and then "done!".
6) helloWorld yields. Execution jumps to b) with undefined as returned value.
b) undefined is assigned to val and err is reset.
c) and d) The tests are false.
e) The test is true. The helloWorld generator is closed and popped from the chain. Now, g is undefined.
h) The run loop is over. run calls its callback, which was passed by startDemo.
8) startDemo‘s callback prints "continuing with callback".

This was probably tedious and painful to follow but I think that it is worth going through it at least once step by step, to understand how execution unfolds when mixing generators and asynchronous calls. I find it not very obvious, to say the least and I had to take a few aspirins to get the code into a simple form that works.

Also, the step by step execution that I just described did not explore all the branches because helloWorld is a well behaved dancer that does not throw exceptions. But the run function is robust enough to cope with less well behaved dancers.

The nice part is that, with these two functions, we can now write async code with normal, sync-like control flow. For example, we can call the delay function in a loop as:

function printNumbers(_, min, max) {
  for (var i = min; i <= max; i++) print(yield delay(_, i));
  yield;
}

AsyncGen.run(printNumbers, [function(err) {
  if (err) alert(err);
  else print("continuing with callback");
}, 1, 100], 0);

The two functions, run and invoke allow us to cross the mirror in both direction between the callbacks world and the generators world:

callbacks => run => generators => invoke => callbacks

The little trouble with finally

The run function is rather powerful and can handle all JavaScript constructs except one: finally clauses.

The problem is that all return statements are converted to yield statements and run is assuming that the generator is done when it yields a value (e) rather than another generator (f). This works, except in the case where the return was inside a try block with a finally clause. In this case, run must enter the generator again, to execute the finally clause after the return. And, unfortunately, g.send(val) cannot do the job because it would resume just after the return which has been turned into a yield, instead of resuming in the finally clause.

There is nevertheless a workaround, which imposes a small amount of code rewriting. The idea is to rewrite:

function f(_) {
  ...
  try {
    ...
      return x;
    ...
  } finally {
    ...
  }
  ...
}

as:

function f(_) {
  ...
  var result__ = null;
  finally__:
  do {
    try {
      ...
        { result__ = [x]; break finally__; }
      ...
    } finally {
      ...
    }
  } while (false);
  if (result__) return result__[0];
  ...
}

The do { ... } while (false); loop looks silly but it is labeled and it acts as a goto which lets us move the return outside of the try/finally construct.
Once all the returns have been moved outside of the try/finally, they can be converted to yield and the run function is back on tracks.

Looking for the full stack trace, desperately

The second problem that I hit in this experiment was the difficulty to reconstruct the stack trace when catching an exception.

The exception carries the stack trace of the current resume call but I did not find a good way to reconstruct the stack trace that corresponds to the sequence of yield calls. All the information is in the generators that are in the g --> g.prev --> g.prev.prev ... chain but I could not find any API to access this information.

The situation could be improved by making every generator yield upon entry to pass stack trace information to the run function but this is inefficient and insufficient (it can give the function names but not the line numbers). Thing would be much better if generators had an API that run could call to get stack information.

Wrapping up

As I mentioned in the introduction, I have added generators as a third blade to streamline.js. I was quite confident that streamline could be adjusted to generate code that takes advantage of generators but as usual, the proof of the pudding is in the eating. Good news! It works!

You can play with it here:

I’ll probably continue to use streamline, though, even when generators get widely available: the preprocessor handles all the ugly details and hides all the extra yield, invoke and run under the carpet. All that remains is the underscore in every spot where the code yields. Also, streamline provides other features, like futures, that would need to be hand-coded otherwise.

Having three blades makes it easy to compare callbacks, fibers and generators:

First, the generators and fibers transforms produce code that is much leaner and simpler than what the callback transform produces. This does not necessarily mean that they will outperform callbacks because they have their own runtime overhead (coroutines). I did not have time to do serious benchmarking but my experience with real code shows that callbacks win when there is a thin layer of logic on top of I/O calls but fibers can take the advantage when the logic becomes thicker and when caching comes into play.

I cannot benchmark fibers against generators today because they are supported by different engines (V8 and spidermonkey). All I can do is compare the generated code patterns. And here, the fibers transform wins because it does not need to introduce yield at all levels, only at the lowest level. So, fibers have an advantage with thick layers of code, which will need to be balanced with the optimizations that the JS engine may be able to apply to generators.

And I won’t close this post without thanking Marcel one more time. His implementation of the fibers transform gave me a big head start in this experiment.

Links

5 Responses to Asynchronous JavaScript with Generators – An Experiment

deanlandolt says:

May 19, 2012 at 16:06

Nice writeup, Bruno. You really ought to check out Dave Herman’s task.js lib [1] — it really showcases what you can do with generators for control flow. Tasks as composable units of execution make the control flow code we’re all writing today look pretty clunky.

Also, it may be worth pointing out that traceur has generator support. As yet another rewriting tool it’s probably of little too useful to you for streamline’s purposes. But given the semantics are fairly well settled for generators in the next version of javascript I wonder if streamline could borrow or incorporate the task.js API and eliminate the need for any magic underscore stuff. At that point it could be considered a polyfill and I bet would be a whole lot less controversial.

[1] https://github.com/mozilla/task.js

- Bruno Jouhier says:
  
  May 19, 2012 at 21:15
  
  Hi Dean,
  
  I checked the task library before. What did not thrill me is the fact that it is a control flow library. Why would we need libraries with APIs to do control flow, when we already have control flow constructs in the language? I was also a bit scared by the potential overhead of the task/scheduler/promises machinery.
  
  The main problem I’m addressing with streamline is the one of applications that have thick layers of rather conventional logic (things with lots of if/then/else branches and simple loops) on top of I/O layers. The logic is usually easy to express with vanilla JavaScript (and the nice ES5 array utilities) and a lot of code sequences cannot be parallelized (because they get data, then test, then get more data, then test again, and so on, and so on). Then, what’s the benefit of having a control flow library? The JS flow constructs are more concise and also more efficient.
  
  I’m a bit worried by the overhead introduced by control flow libraries. With a bit of preprocessing I can get it down to very simple run/invoke functions but I’ll still be paying a price for all the yield/g.send dancing. So I want to bench this against the callback transform and the fibers transform. The nice thing about the fibers transform is that it introduces almost no overhead in the logic layer; the overhead is only at the very bottom, in the I/O calls. With generators I have the impression that we are going to pay a higher price in the layers that are only “indirectly” asynchronous (they are asynchronous only because they sit on top of async I/O layers, they are not intrinsically async).
  
  But I’ll take a second look.
  
  Bruno
  
  - abral says:
    
    March 14, 2014 at 05:16
    
    task.js is being used in Firefox, it really improves the readability and quality of the code.
    (actually it’s a subset of task.js, without the scheduler and other features: http://mxr.mozilla.org/mozilla-central/source/toolkit/modules/Task.jsm)
Pingback: Harmony Generators in streamline.js | Bruno's Ramblings
Pingback: On Harmony JavaScript Generators | Madesu