I introduced streamline.js
about four months ago and I never took the time to explain the theory (if it can be called a theory) behind it. I’ll try to do it in this post.
The material that follows is a lot less entertaining than the little tale I started with. The tale could give the wrong impression that streamline.js is just a cute hack. This post gives a different perspective.
Y-R Javascript
For this investigation, I won’t use streamline.js as is because its syntax does not differentiate it well enough from vanilla Javascript. Instead, I will introduce a fictional language that I will call Y-R Javascript (for Yield and Resume Javascript).
Y-R Javascript is a small extension to Javascript. It introduces a new operator that can be applied to function definitions and function calls:
function myFunc@(p1, p2) { // function body here } result = myFunc@(arg1, arg2);
The @
sign is the yield and resume operator. When present in a function definition, it means that the function executes asynchronously. Somewhere in its body, the function yields to the system and the system calls it back with a result or an error. It may actually yield and resume more than once before returning to its caller. I will call such a function an asynchronous function, in contrast with normal Javascript functions (defined without @
) that I will naturally call synchronous functions.
Y-R Javascript gives you two possibilities to call an asynchronous function.
- The first one is to call it as
myFunc@(arg1, arg2)
. In this case, the function will yield and resume (maybe several times) and then return a result to its caller (or throw an exception).
- The second one is to call it as
myFunc(arg1, arg2)
(without@
). In this case, the function will synchronously return a futureF
that you may call later asF@()
to obtain a result. Yielding will happen when you callF@()
, not in the initial call tomyFunc
.
I will also be assuming that the runtime library of Y-R Javascript contains a mixture of synchronous and asynchronous functions. The developer who writes asynchronous programs with Y-R Javascript does not use callbacks. Instead, he uses the @
operator to call asynchronous functions very much like synchronous ones. For example:
function lineCount@(fname) { return readFile@(fname, "utf8").split("\n").length; } function compareLC@(fname1, fname2) { // serial version: start reading fname2 only after // getting the count for fname1 return lineCount@(fname1) - lineCount@(fname2); } function fastCompareLC@(fname1, fname2) { // parallelized I/O with futures var f1 = lineCount(fname1); var f2 = lineCount(fname2); // both I/O operations have been started. // now we yield and resume to obtain the results. return f1@() - f2@(); }
Y-R Javascript also supports anonymous asynchronous functions. The syntax is naturally:
function@(args) { // body }
Y-R Javascript imposes the following rule:
Asynchronous calls are contagious: if a function contains an asynchronous call (an
@
call), then the function itself must be defined as an asynchronous function (defined with an@
).
So, for example, a Y-R Javascript compiler would reject the following function definition:
function lineCount(fname) { return readFile@(fname, "utf8").split("\n").length; }
On the other hand, it would accept the following code:
function getLineCountFunc() { return function@(fname) { return readFile@(fname, "utf8").split("\n").length; } }
In this last sample, getLineCountFunc
is a synchronous function that returns an asynchronous function. The rule given above is not broken because getLineCountFunc
does not contains any asynchronous function calls. What it contains is an asynchronous function definition.
There is also an additional rule to deal with top level calls:
Y-R Javascript does not allow asynchronous calls at the top level of a script module because this would conflict with the fact that
require
works synchronously.
The only exception to this rule is a main script in which we allow top level asynchronous calls.
Equivalence
In this post I want to investigate the relationship between this fictional Y-R Javascript language on one side and Javascript with asynchronous callbacks on the other side.
I will show that any program written in Y-R Javascript can be mechanically translated into an equivalent Javascript program with asynchronous callbacks, and vice versa.
From Y-R to Callbacks
The translation from Y-R Javascript to Javascript with asynchronous callbacks requires many steps. I will not give a complete proof in this post but I will try to give enough information to convince the reader that this transformation is possible and that it preserves the semantics of the Y-R Javascript source.
Synchronous functions
The first thing to observe is that synchronous function definitions may only contain synchronous function calls. So synchronous functions do not need any translation at all.
If a synchronous function contains an asynchronous function definition, like getLineCountFunc
above, then the synchronous function itself does not need any translation but the asynchronous function definition that it contains does need to be translated.
Main script with top level asynchronous calls
So we only have to translate asynchronous function definitions and top level asynchronous calls.
We can eliminate the case of top-level asynchronous calls. As mentioned above, these calls are only allowed in main scripts. Here is an example:
var count = readFile@(argv[1], "utf8").split("\n").length; console.log(argv[1] + " has " + count + " lines.");
This script can be wrapped as follows:
(function@() { var count = readFile@(argv[1], "utf8").split("\n").length; console.log(argv[1] + " has " + count + " lines."); })();
This transformation gives us a module that contains a single synchronous call to an anonymous asynchronous function.
The problem has been simplified. We don’t need to worry about top level asynchronous statements any more. If we know how to transform an arbitrary asynchronous function definition then we will have solved our problem.
Canonicalization steps
Our goal now is to eliminate all the @
calls in an arbitrary function definition and replace them by logic based on asynchronous callbacks.
But we are not going to introduce the callbacks right away. Before that, we are going to transform the Y-R Javascript code into a canonical form. The goal is to reduce the number of language constructs that we have to convert to callback form afterwards.
Each of these canonicalization steps is a simple transformation that preserves the semantics of the program. So the program that we obtain after applying all these steps will behave exactly like the the original program.
Moving variable declarations
The first canonicalization step moves all the variable declarations at the beginning of the function body. Here is an example:
function f1@() { var v1 = 2; if (f2@()) { var v2 = f3@(); return v1 + v2; } return v1; } |
function f1@() { var v1, v2; v1 = 2; if (f2@()) { v2 = f3@(); return v1 + v2; } return v1; } |
Moving functions declarations
We apply the same treatment to functions declarations (subject to hoisting). For example:
function f1@() { f2@(f3()); function f3() {...} } |
function f1@() { function f3() {...} f2@(f3()); } |
Referencing this
and arguments
with variables
If the function body contains the this
keyword, we introduce a new variable to reference this
and we replace all occurrences of this
by this variable. For example:
function f1@() { f2@(this.x); f3(this); |
function f1@() { var __this = this; f2@(__this.x); f3(__this); } |
The arguments
variable is handled in a similar way.
These three steps will allow us to introduce anonymous functions when we convert the body, without having to worry about the scope of variables and functions defined inside the body, and without having to worry about the fact that function calls may modify the value of this
.
Canonicalizing loops
The next step is to transform loops. Javascript has several loop constructs but the most general one is the for(init; cond; incr)
loop and they can all be converted to this form. We also move the init
clause (if any) before the loop. Here are some examples:
while (cond) { ... } |
for (; cond; ) { ... } |
do { ... } while (cond); |
var __1; __1 = true; for (; __1 || cond; ) { __1 = false; ... } |
for (init; cond; incr) { ... } |
init; for (; cond; incr) { ... } |
for (key in obj) { ... } |
var __1, __2, __3; __1 = __keys(obj); __2 = __1.length; __3 = 0; for (; __3 < __2; __3++) { key = __1[__3]; ... } |
Note: __keys
is a little helper functions that returns the properties of the object as an array. It is different from Object.keys
because it also returns the prototype’s properties. To be completely rigorous we should generate an extra test in the loop to skip properties that may have been deleted during the loop.
If the loop condition is a complex condition containing one or more asynchronous calls, we transform it as follows:
for (; cond; incr) { ... } |
for (; (function@() { return cond; })@(); incr) { ... } |
This last transformation ensures that the whole condition will be evaluated at each iteration, even after it has been disassembled (see step below).
Splitting try/catch/finally
This step is very straightforward: we split try
/catch
/finally
statements in two, as follows:
try { ... } catch (ex) { ... } finally { ... } |
try { try { ... } catch (ex) { ... } } finally { ... } |
This will allow us to handle try
/catch
and try
/finally
with independent transformations.
Eliminating lazy operators
The next canonicalization step will eliminate the three lazy operators of Javascript (&&
, ||
and ?:
) when they combine asynchronous expressions. These operators will be converted into function calls as described in the following table:
exp1 && exp2 |
(function@() { var v = exp1; if (!v) return v; return exp2; })@(); |
exp1 || exp2 |
(function@() { var v = exp1; if (v) return v; return exp2; })@(); |
cond ? exp1 : exp2 |
(function@() { if (cond) return exp1; else return exp2; })@(); |
Thanks to this transformation, we won’t have to worry about lazy operators any more. They have been converted to asynchronous functions that contain if
/else
statements.
Disassembling expressions
The next canonicalization step will disassemble all expressions that contain asynchronous calls. Here are three examples:
function f1@() { var x, y; x = f2@(f3() + f4@()); y = f5@(x).f6@(); ... } |
function f1@() { var x, y, __1, ... __1 = f3(); __2 = f4@(); __3 = __1 + __2; x = f2@(__3); __4 = f5@(x); y = __4.f6@(); ... } |
function f1@() { if (f1@() < f2() + f3@()) { ... } } |
function f1@() { var __1, __2, ... __1 = f1@(); __2 = f2(); __3 = f3@(); __4 = __2 + __3; __5 = __1 < __4; if (__5) { ... } } |
function f1@() { return f1@() + f2() + f3@(); } |
function f1@() { var __1, __2, ... __1 = f1@(); __2 = f2(); __3 = f3@(); __4 = __1 + __2 + __3; return __4; } |
This transformation can be implemented by processing the parse tree in a bottom up fashion and introducing an intermediate variable (__1
, __2
, etc.) for every node.
As demonstrated by the second example, the transformation moves asynchronous calls outside of if
conditions. It also moves them outside of switch
, return
and throw
expressions, as demonstrated by the third example above. On the other hand, it does not move asynchronous calls outside of loop conditions but they will be moved a bit later (see loop transformation pattern below).
This transformation is valid because we have taken special care of loop conditions and lazy operators. The following transformation would have been incorrect:
if (f1@() && f2@()) { ... } |
var __1, __2; __1 = f1@(); __2 = f2@(); // wrong! __3 = __1 && __2; if (__3) { ... } |
Post-canonicalization state
We haven’t introduced any callbacks so far. We have just transformed our Y-R Javascript source into another Y-R Javascript source. It is relatively easy to verify that the individual transformations that we have applied preserve execution semantics. So, the transformed source that we have at this stage is another Y-R Javascript program which has the same observable behaviors as the original program.
But the new source will be easier to transform than the original one because all the complex expressions have been disassembled. Asynchronous calls will only be found in the following constructs:
func@(args) obj.func@(args) |
asynchronous procedure call |
v = func@(args) v = obj.func@(args) |
asynchronous function call |
for (; func@(args); incr) for (; obj.func@(args); incr) |
loop condition |
We are now ready to introduce the callbacks.
Transforming an asynchronous function definition
An asynchronous function definition is transformed as follows:
function f@(a1, a2) { ... } |
function f(__cb, a1, a2) { if (!__cb) return __future.call(this, f, arguments); ... __cb(); } |
The __cb
argument is the callback that the transformed f
function will use to return a value or an error. It has the standard node.js callback signature:
__cb(err, result)
The callback is passed as first rather than last argument. This choice is different from the standard node.js convention but it makes it completely trivial to handle optional parameters.
__future
is a helper function similar to the one I described in my April post.
The transformation adds a __cb()
call at the end of the function body because the callback must be called if execution reaches this point without hitting a return
or a throw
.
Transforming return and throw statements
The disassemble step has moved asynchronous calls outside of return
and throw
statements. So these statements can now be easily converted:
return exp; |
return __cb(null, exp); |
throw exp; |
return __cb(exp); |
Transforming loops
All loops have already been transformed into a for
loop. The loops that do not contain any asynchronous calls do not to be transformed any further. The loops that contain asynchronous calls will be converted to callback form as follows:
for (; cond; incr) { loopBody } afterLoop |
var __beenHere = false; (function(__break) { (function __loop() { if (__beenHere) { incr } else { __beenHere = true; } var __doIt = cond; if (__doIt) { loopBody; __loop(); } else { __break(); } })(); })(function() { afterLoop }); |
break
and continue
statements that appear inside the loop will be transformed as follows:
break; |
return __break(); |
continue; |
return __loop(); |
Transforming if statements
Similarly, if
statements that do not contain any asynchronous calls do not need to be transformed. The ones that contains asynchronous calls will be transformed as follows:
if (cond) { thenBody } else { elseBody } afterIf |
(function(__then) { if (cond) { thenBody __then(); } else { elseBody __then(); } })(function() { afterIf }); |
Transforming switch statements
switch
statements have a similar transformation pattern:
switch (exp) { case v1: v1Body break; case v2: case v3: v2v3Body break; default: defaultBody } afterSwitch |
(function(__break) { switch (exp) { case v1: v1Body return __break(); case v2: case v3: v2v3Body return __break(); default: defaultBody return __break(); } })(function() { afterSwitch }); |
break
statements contained in the branches of a switch
will be transformed as follows:
break; |
return __break(); |
Here we assume that the branches of the switch
are all terminated by a break
, a return
or a throw
. If this is not the case and if one of the branches leaks into the next branch, the transformation is still possible but a bit more complex. I will not describe it here.
Transforming try/catch
try
/catch
statements are a bit more challenging. The pattern is the following:
try { tryBody } catch (ex) { catchBody } afterTryCatch |
(function(__then) { (function(__cb) { __tryCatch(__cb, function() { tryBody; __then(); }); })(function(ex, __result) { __tryCatch(__cb, function() { if (ex) { catchBody; __then(); } else { __cb(null, __result); } }); }); })(function() { __tryCatch(__cb, function() { afterTryCatch; }); }); |
The __tryCatch
helper function is defined as follows:
function __tryCatch(__cb, fn) { try { fn(); } catch (e) { try { __cb(e); } catch (ex) { console.error("UNCAUGHT EXCEPTION: " + ex.message + "\n" + ex.stack); } } }
The pattern is rather heavy but it is based on a simple idea: we execute the try
body in a different scope which overrides __cb
. The overriden __cb
executes the catch
body if called with an error. Otherwise, it forwards the result to the enclosing __cb
callback.
The amount of try
/catch
logic is a really overwhelming and may look paranoid. But it is not. The problem is that most of this code is executed in the context of a callback and very strange things (like jumping into catch
clauses that belong to functions from which we have already returned) would happen if we did not plug all the holes and if we let some exceptions bubble up through the callback chain. This is why we invoke __cb
via the __tryCatch
helper function.
Transforming try/finally
try
/finally
is our best of show. The pattern is the following:
try { tryBody } finally { finallyBody } afterTryFinally |
(function(__then) { (function(__cb) { __tryCatch(__cb, function() { tryBody; __cb(null, null, true); }); })(function(__e, __r, __cont) { (function(__then) { __tryCatch(__cb, function() { finallyBody; __then(); }); })(function() { __tryCatch(__cb, function() { if (__cont) __then(); else __cb(__e, __r); }); }) }); })(function() { __tryCatch(__cb, function() { afterTryFinally; }); }); |
The idea is again to override __cb
. I won’t explain it much further here.
Transforming the remaining blocks
This is probably getting very boring but we are now getting close to the end of our transformation journey.
We have now transformed all the control flow statements and we are left with blocks that contain asynchronous function calls.
If you look carefully at the transformations that we have used in the callback generation phase, you will notice the following:
- Every time a control flow statement was encountered, the control flow statement and the statements that followed it have been replaced by a function call like
(function(__then) { ... })(function() { ... })
. - All the code blocks that have been transfered inside these functions are now followed by a function call like
__then()
,__cb()
or__break()
.
So, we are left with blocks that contain the asynchronous function calls produced by the disassembly pass, and that are terminated by a function call (the two cases above).
The blocks that contained a return
or a throw
statement before the transformation now have a return __cb(...)
statement before their ending function call. These blocks can be simplified by removing all the statements that follow the return __cb(...)
statement. After this simplification we can also remove the return
keyword on the return __cb(...)
statement.
So we are left with blocks that contain:
- synchronous statements that we don’t need to transform.
func@(args);
statementsobj.func@(args);
statementsv = func@(args);
statementsv = obj.func@(args);
statements- a function call at the end.
If we manage to transform the statements that contain an asynchronous call we will be done!
Transforming asynchronous calls
The asynchronous calls that remain will be transformed with the following patterns:
func@(args); afterCall |
func(function(__err) { if (__err) return __cb(__err); afterCall }, args); |
v = func@(args); afterCall |
func(function(__err, __result) { if (__err) return __cb(__err); v = __result; afterCall }, args); |
Note: the obj.func@(args)
statements are transformed like the func@(args)
statements.
That’s it! We now have a complete mechanical process that can transform any Y-R Javascript program into an equivalent Javascript program with asynchronous callbacks.
From Callbacks to Y-R
If you read this far you probably deserve a medal. Unfortunately for you, we are only half-way through! What about the other direction? Can any Javascript program with asynchronous callbacks be converted into an equivalent Y-R Javascript program?
Fortunately the reverse proof will be much shorter. We just need to prove two things:
- That the definition of a callback-based asynchronous function definition can be converted to a definition of a Y-R Javascript function.
- That a callback-based asynchronous function call can be converted to a Y-R Javascript function call.
In this second proof, we will assume that the callback function has the default node.js format and we won’t worry about the callback’s position in the argument list. A complete proof would need to take these variations into account but these details don’t really bring anything fundamental to the table. So we will ignore them here.
Transforming a callback-based function definition
This part is completely trivial. The transformation is the the following:
function f(a, b, callback) { ... } |
function f@(a, b) { ... } |
Transforming a callback-based asynchronous call
This one is a little trickier. Here is a solution:
f(a, b, function(err, res) { callbackBody }) |
(function@() { var err, res try { res = f@(a, b); } catch (ex) { err = ex; } callbackBody })(); |
The try
/catch
part is not very interesting. We just need it to extract the error and assign it to an err
variable.
The (function@() { ... })()
wrapper is more interesting. What it does is wrap the function call into an anonymous asynchronous function which is then called synchronously (no @
operator before the parentheses). This call produces a future that we don’t even assign to a variable because we don’t really care about the termination of this anonymous call. What we care about is that the callback body will be called after f
produces a result or an error, which is obviously the case!
Wrapping-up
This was a rather long and probably boring explanation and I haven’t even proved anything formally. But I hope that I have given enough elements to convince you that any Y-R Javascript program can be mechanically converted into an equivalent Javascript program with asynchronous callbacks, and vice versa.
You may wonder why this could be of any interest and why I wasted several hours to write this. The reason is that I think that Y-R Javascript is a much better language than Javascript with asynchronous callbacks when it comes to writing asynchronous code.
Y-R Javascript lets you to write very lean and elegant code. You can express your logic with the native constructs of the language rather than with flow-control libraries; you can do proper exception handing with try
/catch
/finally
; you can chain calls even if some of them are asynchronous, etc. In short, you feel like a Javascript programmer, not like a super-hero who spends his days taming callbacks.
Y-R Javascript has another quality: it makes it very easy to reason about race conditions. The yield and resume points are very easy to spot: they correspond to all the @
calls in your code. If you see a piece of code that does not contain any @
calls, you immediately know that this code never yields and cannot be subject to a race condition of any kind. I think that this is an important feature.
The first part of this dissertation proves that Y-R Javascript is not a leaky abstraction: Y-R programs can be faithfully translated to Javascript programs with asynchronous callbacks.
The second part proves that Y-R Javascript is not a crippled subset of Javascript with asynchronous callbacks. Anything that you can do with asynchronous callbacks can also be done with Y-R Javascript’s primitives (@
calls and futures).
At this point, I have probably exhausted you (and I got my dose too). I will write another post later to illustrate the differences between Y-R Javascript and asynchronous callbacks. The only remaining point that I want to cover quickly is the relationship of all this to my current pet project: streamline.js
Y-R Javacript vs. streamline.js
Streamline.js is very similar to Y-R Javascript. The main difference is on the syntax: streamline.js uses an _
parameter in place of the @
operator.
I chose this syntax for streamline.js because I wanted to be able to edit and reformat my code with standard Javascript tools. It also made implementation easier because I did not have to modify parsers: I could use narcissus as is and I could piggyback the transformation to the CoffeeScript compiler. And, last but not least, it made it easier to call native node.js libraries because you can control where the callback goes in the argument list.
But I don’t have any special attachment to the _
parameter, nor to the @
operator. What I like in all this is the concept of a yield-resume marker that you can tag both onto function definitions and onto asynchronous function calls. I also like the idea of having asynchronous functions return a future when they are called without the marker. And I like the minimalist nature of this design: no extra libraries (promises, futures, tasks), no new keywords, just a small marker!
One last point: the CPS transform that I described here is similar to what I implemented in the streamline compiler. It is not completely identical though (and this article may have a few bugs that I introduced when adapting the patterns) but the general spirit is the same.