I\'ve been very excited about Node JS for awhile. I finally decided to knuckle down and write a test project to learn about generators in the latest Harmony build of Node.
@loganfsmyth already provides a great answer to your question. The goal of my answer is to help you understand how JavaScript generators actually work, as this is a very important step to using them correctly.
Generators implement a state machine, the concept which is nothing new by itself. What's new is that generators allow to use the familiar JavaScript language construct (e.g., for
, if
, try/catch
) to implement a state machine without giving up the linear code flow.
The original goal for generators is to generate a sequence of data, which has nothing to do with asynchrony. Example:
// with generator
function* sequence()
{
var i = 0;
while (i < 10)
yield ++i * 2;
}
for (var j of sequence())
console.log(j);
// without generator
function bulkySequence()
{
var i = 0;
var nextStep = function() {
if ( i >= 10 )
return { value: undefined, done: true };
return { value: ++i * 2, done: false };
}
return { next: nextStep };
}
for (var j of bulkySequence())
console.log(j);
The second part (bulkySequence
) shows how to implement the same state machine in the traditional way, without generators. In this case, we no longer able to use while
loop to generate values, and the continuation happens via nextStep
callback. This code is bulky and unreadable.
Let's introduce asynchrony. In this case, the continuation to the next step of the state machine will be driven not by for of
loop, but by some external event. I'll use a timer interval as a source of the event, but it may as well be a Node.js operation completion callback, or a promise resolution callback.
The idea is to show how it works without using any external libraries (like Q
, Bluebird
, Co
etc). Nothing stops the generator from self-driving itself to the next step, and that's what the following code does. Once all steps of the asynchronous logic have completed (the 10 timer ticks), doneCallback
will be invoked. Note, I don't return any meaningful data with yield
here. I merely use it to suspend and resume the execution:
function workAsync(doneCallback)
{
var worker = (function* () {
// the timer callback drivers to the next step
var interval = setInterval(function() {
worker.next(); }, 500);
try {
var tick = 0;
while (tick < 10 ) {
// resume upon next tick
yield null;
console.log("tick: " + tick++);
}
doneCallback(null, null);
}
catch (ex) {
doneCallback(ex, null);
}
finally {
clearInterval(interval);
}
})();
// initial step
worker.next();
}
workAsync(function(err, result) {
console.log("Done, any errror: " + err); });
Finally, let's create a sequence of events:
function workAsync(doneCallback)
{
var worker = (function* () {
// the timer callback drivers to the next step
setTimeout(function() {
worker.next(); }, 1000);
yield null;
console.log("timer1 fired.");
setTimeout(function() {
worker.next(); }, 2000);
yield null;
console.log("timer2 fired.");
setTimeout(function() {
worker.next(); }, 3000);
yield null;
console.log("timer3 fired.");
doneCallback(null, null);
})();
// initial step
worker.next();
}
workAsync(function(err, result) {
console.log("Done, any errror: " + err); });
Once you understand this concept, you can move on with using promises as wrappers for generators, which takes it to the next powerful level.
First and foremost, it is important to have a good model in your head of exactly what a generator is. A generator function is a function that returns a generator object, and that generator object will step through yield
statements within the generator function as you call .next()
on it.
Given that description, you should notice that asynchronous behavior is not mentioned. Any action on a generator on its own is synchronous. You can run to the first yield
immediately and then do a setTimeout
and then call .next()
to go to the next yield
, but it is the setTimeout
that causes asynchronous behavior, not the generator itself.
So let's cast this in the light of fs.readdir
. fs.readdir
is an async function, and using it in a generator on its own will have no effect. Let's look at your example:
function * read(path){
return yield fs.readdir(path);
}
var gen = read(path);
// gen is now a generator object.
var first = gen.next();
// This is equivalent to first = fs.readdir(path);
// Which means first === undefined since fs.readdir returns nothing.
var final = gen.next();
// This is equivalent to final = undefined;
// Because you are returning the result of 'yield', and that is the value passed
// into .next(), and you are not passing anything to it.
Hopefully it makes it clearer that what you are still calling readdir
synchronously, and you are not passing any callback, so it will probably throw an error or something.
Generally this is accomplished by having the generator yield a special object that represents the result of readdir
before the value has actually been calculated.
For (unrealistic) example, yield
ing a function is a simple way to yield something that represents the value.
function * read(path){
return yield function(callback){
fs.readdir(path, callback);
};
}
var gen = read(path);
// gen is now a generator object.
var first = gen.next();
// This is equivalent to first = function(callback){ ... };
// Trigger the callback to calculate the value here.
first(function(err, dir){
var dirData = gen.next(dir);
// This will just return 'dir' since we are directly returning the yielded value.
// Do whatever.
});
Really, you would want this type of logic to continue calling the generator until all of the yield
calls are done, rather than hard-coding each call. The main thing to notice with this though, is now the generator itself looks synchronous, and everything outside the read
function is super generic.
You need some kind of generator wrapper function that handles this yield value process, and your example of the suspend does exactly this. Another example is co.
The standard method for the method of "return something representing the value" is to return a promise or a thunk since returning a function like I did is kind of ugly.
With the thunk
and co
libraries, you with do the above without the example function:
var thunkify = require('thunkify');
var co = require('co');
var fs = require('fs');
var readdir = thunkify(fs.readdir);
co(function * (){
// `readdir` will call the node function, and return a thunk representing the
// directory, which is then `yield`ed to `co`, which will wait for the data
// to be ready, and then it will start the generator again, passing the value
// as the result of the `yield`.
var dirData = yield readdir(path, callback);
// Do whatever.
})(function(err, result){
// This callback is called once the synchronous-looking generator has returned.
// or thrown an exception.
});
Your update still has some confusion. If you want your list
function to be a generator, then you will need to use co
outside of list
wherever you are calling it. Everything inside of co
should be generator-based and everything outside co
should be callback-based. co
does not make list
automatically asynchronous. co
is used to translate a generator-based async flow control into callback-based flow control.
e.g.
list: function(path, callback){
co(function * (){
var list = yield readdir(path);
// Use `list` right here.
return list;
})(function(err, result){
// err here would be set if your 'readdir' call had an error
// result is the return value from 'co', so it would be 'list'.
callback(err, result);
})
}