I have recently added a third blade to my little asynchronous programming swiss army knife: the generator blade.
Looks sharp! Here are some details.
Generators
Generators are JavaScript’s flavor of coroutines. They solve one of computer science’s most important problems: generating the mysterious fibonacci numbers!
function genFibos() { var f1 = 0, f2 = 1; while (true) { yield f1; var t = f1; f1 = f2; f2 += t; } } function printFibos() { var g = genFibos(); for (var i = 0; i < 10; i++) { var num = g.next(); print(num); } } printFibos();
Generators are characterized by the presence of a yield
keyword. Any function that contains a yield
, like the genFibos
function above, returns a generator.
Generators have a somewhat unusual execution pattern. I’ll quickly describe it on the example above.
The first oddity is that the genFibos()
call inside printFibos
does not execute the body of the genFibos
function. Instead, it just returns a generator which is assigned to the g
variable.
The first call to g.next()
in the printFibos
loop starts the generator: the body of the genFibos
function executes until it reaches the yield
keyword for the first time. At this point, control is transfered from genFibos
to the point where g.next()
was called and the num
variable receives the value of f1
(1).
Then, execution continues in printFibos
‘s for
loop; i
is incremented and g.next()
is called a second time. This call transfers execution to the genFibos
function again, at the point where we left it before, i.e. just after the first yield
. The genFibos
function loops and reaches yield
a second time. As previously, control is transfered back to printFibos
, and g.next()
returns the value of f1
(2), which is assigned to num
.
The for
loop continues, hits another g.next()
which transfers control to the yield point inside genFibos
, which causes genFibos
to loop again to the next yield
, and so on, and so on.
What we have here is a little dance between printFibos
and genFibos
, with g.next()
and yield f1
to jump from one to the other. Coroutines: functions that cooperate by yielding control to each other.
A bit disconcerting at first but it works, and it can help us deal with asynchronous code, as we’ll see shortly.
Generators in JavaScript
Generators are in the works for EcmaScript Harmony but they haven’t landed in V8 yet. So we cannot use them in node.js today.
On the other hand, they have been available for more than 5 years in Firefox (Firefox 2 and up). They are disabled by default but you can enable them by adding a version attribute to the <script>
tag:
<script type="application/javascript;version=1.7"> function genFibos() { ... } function printFibos() { ... } printFibos(); </script>
And they are also available in luvmonkey, Tim Caswell’s port of libuv to spidermonkey. I’ve used both of these environments in my experimentations.
Streamline.js
Streamline.js is my little language tool for asynchronous JavaScript programming. It does not help much with fibonacci numbers but it does help a lot with code that sits on top of asynchronous APIs. No need to twist your brain any more; you can just write the code as if the APIs were synchronous, as long as you follow one simple rule:
Pass the
_
marker everywhere a callback is expected.
Here is an example:
// waits one second (async/non blocking) and returns result function delay(_, result) { setTimeout(_, 1000); return result; } function helloWorld(_) { print("starting ..."); print(delay(_, 'hello ...')); print(delay(_, '... world')); print("done!"); } helloWorld(function(err) { if (err) throw err; else print("continuing with callback"); });
It executes as follows:
_node helloWorld._js starting ... (wait one second)hello ... (wait one second)... world done! continuing with callback
Streamline works by transforming the source, either to callback based code, or to code that uses Marcel Laverdet’s node-fibers library. You can see the callback transformation in action by cutting and pasting the example above in the online demo.
Yielding Hello World!
Supposedly, generators make it easier to write asynchronous code. And they do! Here is the generators version of my little hello world example:
function delay(_, result) { yield AsyncGen.invoke(this, setTimeout, [_, 1000], 0); yield result; } function helloWorld(_) { print("starting ..."); print(yield delay(_, 'hello ...')); print(yield delay(_, '... world')); print("done!"); yield; } function startDemo() { AsyncGen.run(helloWorld, [function(err) { if (err) alert(err); else print("continuing with callback"); }], 0); }
Quite nice! The general flow is similar to the synchronous flow. Instead of having to rewire everything with callbacks you just need to apply some rather simple rules:
- First, keep the _ parameter (*) and add a
yield
keyword in front of all the calls that have this parameter. - Replace all
return
keywords by ayield
. - Don’t forget to add a
yield
at the end of functions that don’t end with areturn
. This is because these functions do return at the end; it is just JavaScript that lets you omit thereturn
when you don’t have a value to return. - Use the special
AsyncGen.invoke
function to call asynchronous APIs that expect a callback, likesetTimeout
. - Use the special
AsyncGen.run
function to call any of your generator functions as a node.js callback style function.
(*) You can rename the _
parameter though, as long as you do it consistently.
The other side of the mirror
Looks good. But how will it work? We now have generator functions that call other generator functions, and yield
keywords all over the place in our asynchronous code. If you remember the little fibonacci dance that we saw earlier, you can imagine that we’ll need a dance partner to interact with these generator functions. And this time, the dance is going to be wild. So we’ll need a very strong partner on the other side to keep it going!
This strong partner is the the AsyncGen.run
function, which gets a little help from AsyncGen.invoke
. Here they come:
var PENDING = {}; window.AsyncGen = { run: function(fn, args, idx) { a) var cb = args[idx], g; function resume(err, val) { while (g) { try { b) val = err ? g.throw(err) : g.send(val); err = null; // if we get PENDING, the current call completed with a pending I/O // resume will be called again when the I/O completes. So just return here. c) if (val === PENDING) return; // if we get [PENDING, e, r], the current call invoked its callback synchronously // we just loop to send/throw what the callback gave us. d) if (val && val[0] === PENDING) { err = val[1]; val = val[2]; } // else, if g yielded a value which is not a generator, g is done. // so we unwind it we send val to the parent generator (or through cb if we are at the top) e) else if (!isGenerator(val)) { g.close(); g = g.prev; } // else, we got a new generator which means that g called another generator function // the new generator become current and we loop with g.send(undefined) (equiv to g.next()) else { f) val.prev = g; g = val; val = undefined; } } catch (ex) { // the send/throw call failed. // we unwind the current generator and we rethrow into the parent generator (or through cb if at the top) g) g.close(); g = g.prev; err = ex; val = undefined; } } // we have exhausted the stack of generators. // return the result or error through the callback. h) cb(err, val); } // set resume as the new callback i) args[idx] = resume; // call fn to get the initial generator j) g = fn.apply(this, args); // start the resume loop k) resume(); }, invoke: function(that, fn, args, idx) { // Set things up so that call returns: // * PENDING if it completes with a pending I/O (and cb will be called later) // * [PENDING, e, r] if the callback is called synchronously. var result = PENDING, sync = true; l) var cb = args[idx]; args[idx] = function(e, r) { m) if (sync) { n) result = [PENDING, e, r]; } else { o) cb(e, r); } } p) fn.apply(that, args); q) sync = false; return result; } };
To demonstrate how it works, I’ll use an annotated version of the hello world program:
function delay(_, result) { 1) yield AsyncGen.invoke(this, setTimeout, [_, 1000], 0); 2) yield result; } function helloWorld(_) { 3) print("starting ..."); 4) print(yield delay(_, 'hello ...')); 5) print(yield delay(_, '... world')); print("done!"); 6) yield; } function startDemo() { 7) AsyncGen.run(helloWorld, [function(err) { 8) if (err) alert(err); else print("continuing with callback"); }], 0); 9) }
Here is how execution unfolds (buckle up and be ready for a little rodeo!):
- 7)
startDemo
callsAsyncGen.run
. - a) and i)
run
stores the callback incb
and replaces it byresume
. - j)
run
callshelloWorld
withresume
as callback. Nothing gets executed inhelloWorld
but ahelloWorld
generator is returned and assigned tog
. - k)
run
callsresume()
. - b)
resume
callsg.send(undefined)
, which is the same as callingg.next()
. Execution jumps to 3) insidehelloWorld
. - 3)
helloWorld
prints"starting..."
. - 4)
helloWorld
callsdelay
. Nothing gets executed indelay
but adelay
generator is returned and yielded byhelloWorld
. Thisyield
jumps back to b), whereg.send
returns thedelay
generator. - b) The
delay
generator is assigned toval
anderr
is reset. - c) d) and e) The tests are false so these
if
branches are skipped. - f) The
delay
generator is chained with thehelloWorld
generator and is assigned tog
.val
is set toundefined
. - b)
resume
callsg.send(undefined)
. Execution jumps to 1) insidedelay
. - 1)
delay
callsAsyncGen.invoke
to invokesetTimeout
withresume
as callback. - l)
invoke
remembers the callback incb
and replaces it by its own callback. - p)
invoke
callssetTimeout
. - p)
sync
is set to false andinvoke
returnsPENDING
. - 1) We are back inside
delay
, andPENDING
is yielded to therun
loop. Executions jumps to b) withPENDING
as return value. - b)
PENDING
is assigned toval
anderr
is reset. - c) The test is true and
resume
returns. - k)
run
returns. - 9)
startDemo
returns. - Sleeping one second…
- m) Awaken in
setTimeout
‘s callback.sync
is false. - o)
resume
is called with both parametersundefined
. - b)
resume
callsg.send(undefined)
. Execution jumps to 1) insidedelay
. - 2)
delay
yieldsresult
, which is"hello ..."
. Execution jumps to b) with"hello ..."
as returned value. - b)
"hello ..."
is assigned toval
anderr
is reset. - c) and d) The tests are false.
- e) The test is true. The
delay
generator is closed and popped from the chain. Now,g
points to thehelloWorld
generator which is where we left it, at 4). - b)
resume
callsg.send("hello ...")
. Execution jumps to 4) insidehelloWorld
. - 4)
helloWorld
prints"hello ..."
. - 5) The same dance happens again with the second
delay
call, up to c) where the currentresume
call returns and o), the end of thesetTimeout
callback. - Sleeping one second…
- m) Awaken in
setTimeout
‘s callback. Same dance as before until"... world"
is returned at 5). - 5)
helloWorld
prints"... world"
and then"done!"
. - 6)
helloWorld
yields. Execution jumps to b) withundefined
as returned value. - b)
undefined
is assigned toval
anderr
is reset. - c) and d) The tests are false.
- e) The test is true. The
helloWorld
generator is closed and popped from the chain. Now,g
isundefined
. - h) The
run
loop is over.run
calls its callback, which was passed bystartDemo
. - 8)
startDemo
‘s callback prints"continuing with callback"
.
This was probably tedious and painful to follow but I think that it is worth going through it at least once step by step, to understand how execution unfolds when mixing generators and asynchronous calls. I find it not very obvious, to say the least and I had to take a few aspirins to get the code into a simple form that works.
Also, the step by step execution that I just described did not explore all the branches because helloWorld
is a well behaved dancer that does not throw exceptions. But the run
function is robust enough to cope with less well behaved dancers.
The nice part is that, with these two functions, we can now write async code with normal, sync-like control flow. For example, we can call the delay
function in a loop as:
function printNumbers(_, min, max) { for (var i = min; i <= max; i++) print(yield delay(_, i)); yield; } AsyncGen.run(printNumbers, [function(err) { if (err) alert(err); else print("continuing with callback"); }, 1, 100], 0);
The two functions, run
and invoke
allow us to cross the mirror in both direction between the callbacks world and the generators world:
callbacks =>
run
=> generators =>invoke
=> callbacks
The little trouble with finally
The run
function is rather powerful and can handle all JavaScript constructs except one: finally
clauses.
The problem is that all return
statements are converted to yield
statements and run
is assuming that the generator is done when it yields a value (e) rather than another generator (f). This works, except in the case where the return
was inside a try
block with a finally
clause. In this case, run
must enter the generator again, to execute the finally
clause after the return
. And, unfortunately, g.send(val)
cannot do the job because it would resume just after the return
which has been turned into a yield
, instead of resuming in the finally
clause.
There is nevertheless a workaround, which imposes a small amount of code rewriting. The idea is to rewrite:
function f(_) { ... try { ... return x; ... } finally { ... } ... }
as:
function f(_) { ... var result__ = null; finally__: do { try { ... { result__ = [x]; break finally__; } ... } finally { ... } } while (false); if (result__) return result__[0]; ... }
The do { ... } while (false);
loop looks silly but it is labeled and it acts as a goto
which lets us move the return
outside of the try/finally
construct.
Once all the returns
have been moved outside of the try/finally
, they can be converted to yield
and the run
function is back on tracks.
Looking for the full stack trace, desperately
The second problem that I hit in this experiment was the difficulty to reconstruct the stack trace when catching an exception.
The exception carries the stack trace of the current resume
call but I did not find a good way to reconstruct the stack trace that corresponds to the sequence of yield
calls. All the information is in the generators that are in the g --> g.prev --> g.prev.prev ...
chain but I could not find any API to access this information.
The situation could be improved by making every generator yield upon entry to pass stack trace information to the run
function but this is inefficient and insufficient (it can give the function names but not the line numbers). Thing would be much better if generators had an API that run
could call to get stack information.
Wrapping up
As I mentioned in the introduction, I have added generators as a third blade to streamline.js. I was quite confident that streamline could be adjusted to generate code that takes advantage of generators but as usual, the proof of the pudding is in the eating. Good news! It works!
You can play with it here:
I’ll probably continue to use streamline, though, even when generators get widely available: the preprocessor handles all the ugly details and hides all the extra yield
, invoke
and run
under the carpet. All that remains is the underscore in every spot where the code yields. Also, streamline provides other features, like futures, that would need to be hand-coded otherwise.
Having three blades makes it easy to compare callbacks, fibers and generators:
First, the generators and fibers transforms produce code that is much leaner and simpler than what the callback transform produces. This does not necessarily mean that they will outperform callbacks because they have their own runtime overhead (coroutines). I did not have time to do serious benchmarking but my experience with real code shows that callbacks win when there is a thin layer of logic on top of I/O calls but fibers can take the advantage when the logic becomes thicker and when caching comes into play.
I cannot benchmark fibers against generators today because they are supported by different engines (V8 and spidermonkey). All I can do is compare the generated code patterns. And here, the fibers transform wins because it does not need to introduce yield
at all levels, only at the lowest level. So, fibers have an advantage with thick layers of code, which will need to be balanced with the optimizations that the JS engine may be able to apply to generators.
And I won’t close this post without thanking Marcel one more time. His implementation of the fibers transform gave me a big head start in this experiment.
Nice writeup, Bruno. You really ought to check out Dave Herman’s task.js lib [1] — it really showcases what you can do with generators for control flow. Tasks as composable units of execution make the control flow code we’re all writing today look pretty clunky.
Also, it may be worth pointing out that traceur has generator support. As yet another rewriting tool it’s probably of little too useful to you for streamline’s purposes. But given the semantics are fairly well settled for generators in the next version of javascript I wonder if streamline could borrow or incorporate the task.js API and eliminate the need for any magic underscore stuff. At that point it could be considered a polyfill and I bet would be a whole lot less controversial.
[1] https://github.com/mozilla/task.js
Hi Dean,
I checked the task library before. What did not thrill me is the fact that it is a control flow library. Why would we need libraries with APIs to do control flow, when we already have control flow constructs in the language? I was also a bit scared by the potential overhead of the task/scheduler/promises machinery.
The main problem I’m addressing with streamline is the one of applications that have thick layers of rather conventional logic (things with lots of if/then/else branches and simple loops) on top of I/O layers. The logic is usually easy to express with vanilla JavaScript (and the nice ES5 array utilities) and a lot of code sequences cannot be parallelized (because they get data, then test, then get more data, then test again, and so on, and so on). Then, what’s the benefit of having a control flow library? The JS flow constructs are more concise and also more efficient.
I’m a bit worried by the overhead introduced by control flow libraries. With a bit of preprocessing I can get it down to very simple run/invoke functions but I’ll still be paying a price for all the yield/g.send dancing. So I want to bench this against the callback transform and the fibers transform. The nice thing about the fibers transform is that it introduces almost no overhead in the logic layer; the overhead is only at the very bottom, in the I/O calls. With generators I have the impression that we are going to pay a higher price in the layers that are only “indirectly” asynchronous (they are asynchronous only because they sit on top of async I/O layers, they are not intrinsically async).
But I’ll take a second look.
Bruno
task.js is being used in Firefox, it really improves the readability and quality of the code.
(actually it’s a subset of task.js, without the scheduler and other features: http://mxr.mozilla.org/mozilla-central/source/toolkit/modules/Task.jsm)
Pingback: Harmony Generators in streamline.js | Bruno's Ramblings
Pingback: On Harmony JavaScript Generators | Madesu