armagast April 4, 2011 at 12:35

Waiting for multiple events in nodejs

From the sandbox

Probably everyone who starts to learn nodejs is having difficulty switching to event-oriented programming. It's simple, as long as we can perform actions sequentially: start the next, waiting for the final event from the previous one. But what if there are many such actions and they are long in time? And if we cannot continue until we wait for each of them to complete?

Everything is simple

A small digression. Any event in nodejs can be associated with some handler function that will be called when the event occurs. Moreover, it does not matter how it will be transferred to the process that triggers the event: in the form of a parameter of an asynchronous function or “binding” of a function to this event. One thing is important: we need to wait for the completion of each of our function handlers.
For starters, consider the simplest example. We have several long-term actions (we will use the setTimeout function) and we will only start the next action after the previous one is completed.
So an example:

console.log("begin");
setTimeout(function () {
	console.log("2000ms timeout");
	setTimeout(function () {
		console.log("1500ms timeout");
		setTimeout(function () {
			console.log("1000ms timeout");
			setTimeout(function () {
				console.log("final");
			}, 500);
		}, 1000);
	}, 1500);
}, 2000);
console.log("end");

We start a certain process with a length of 2000ms, after its completion we start the second at 1500ms, then the third at 1000ms and, finally, the last at 500ms. The result will be as follows:

begin
end
2000ms timeout
1500ms timeout
1000ms timeout
final

Everything is bad if the actions performed by the program really should be performed sequentially, and there is no way to start the second action before the end of the first, and the third - before the second. But if possible, then it is not possible, but necessary!

Why is it necessary?

What is the point of pausing the whole process and waiting for the slow subsystem to do everything we want from it, if we have something to do besides waiting? Nothing. Therefore, the simultaneous launch of several long operations, and these include work with the network, the disk subsystem, can significantly speed up the execution of the program.

A little harder

Suppose that all operations in our country are independent of each other. Then they can be run in parallel:

console.log("begin");
setTimeout(function () {
	console.log("2000ms timeout");
}, 2000);
setTimeout(function () {
	console.log("1500ms timeout");
}, 1500);
setTimeout(function () {
	console.log("1000ms timeout");
}, 1000);
setTimeout(function () {
	console.log("final");
}, 500);
console.log("end");

Execution Result:

begin
end
final
1000ms timeout
1500ms timeout
2000ms timeout

The main program completed execution, and then began to work out sequentially the callback functions for each of the “long actions”. But most importantly, the program execution time has decreased by almost two and a half times!
Again, there is nothing complicated in this example, but only until one of the actions has to wait for the completion of the others.

Harder

Suppose we can start the last action only after waiting for the completion of all the previous ones. The first thing that comes to mind is the organization of the counter. First, its value will be equal to the number of our callback functions, the execution of which must wait. At the end of each of them, the counter will decrease, and when the last reaches zero, our “waiting function” will be called.
For example, like this:

var counter = 3;
console.log("begin");
setTimeout(function () { 
	console.log("2000ms timeout"); 
	if (-- counter == 0) final(); 
}, 2000);
setTimeout(function () { 
	console.log("1500ms timeout"); 
	if (-- counter == 0) final(); 
}, 1500);
setTimeout(function () { 
	console.log("1000ms timeout"); 
	if (-- counter == 0) final(); 
}, 1000);
function final() {
	setTimeout(function () { 
		console.log("final"); 
	}, 500);
}
console.log("end");

Result:

begin
end
1000ms timeout
1500ms timeout
2000ms timeout
final

Everything is fine!
The problems will begin when one day we forget to add a unit to the counter or add a modification of the counter and calling the “waiting function” to the callback function.
“All work with the counter must be wrapped in some kind of object!” I thought. But, as it turned out, I'm not the only one: Tim Caswell has already proposed something similar, and it was his idea that I took as a basis.
Wrapper:

function Combo(finalCallback) {
	this.finalCallback = finalCallback;
	this.result = [];
	this.counter = 0;
}
Combo.prototype = {
	"add" : function (callback) {
		var that = this;
		this.counter ++;
		return function () {
			that.result[that.counter - 1] = callback.apply(this, arguments);
			that.check();
		};
	},
	"check" : function () {
		var that = this;
		this.counter --;
		if (this.counter == 0)
			process.nextTick(function () { 
				that.finalCallback.call(that, that.result); 
			});
	}
};

When creating an object, the constructor, as a parameter, receives a “waiting function”, which the wrapper will launch upon completion of all the “expected functions”. The add method increments the counter and creates a wrapper function for the user "expected function". The created wrapper function launches the corresponding “expected function”, passing all the received parameters to it, and then runs the check method. The check method, in turn, decreases the counter and, when the last reaches zero, starts the “waiting function” passed in the constructor. In this case, just in case, the results of the "expected functions" are saved, which are transmitted as a parameter of the "waiting function".
Example:

var test = new Combo(
		function (result) {
			console.log("final");
			console.log("result:");
			for (var i in result) {	
				console.log("  \"" + i + "\" : \"" + result[i] + "\"");	
			}
		});
console.log("begin");
setTimeout(test.add(function () {
	console.log("2000ms"); return "2000ms"; }), 2000);
setTimeout(test.add(function () { 
	console.log("1500ms"); return "1500ms"; }), 1500);
setTimeout(test.add(function () { 
	console.log("1000ms"); return "1000ms"; }), 1000);
console.log("end");

The result of work:

begin
end
1000ms
1500ms
2000ms
final
result:
  "0": "2000ms"
  "1": "1500ms"
  "2": "1000ms"

Beauty!
Now we can palm off the closing events (for example, the end event for threads) with the wrapper function created by the add method and our "waiting function" will be launched only after all the "expected functions" have completed.
The main thing is not to forget to wrap new "expected functions".

The hardest thing

But what if one of the actions crashes? There will be no problems for asynchronous functions that take a callback function as a parameter: the first parameter of the callback function is a special variable that contains information about the error, if any. But, for example, in the event of a failure, threads "throw" the error event. At the same time, the final event “end” will not be thrown, which means that one of the “expected functions” will not be triggered. As a result, the “pending function” will never be launched.
If it is possible to complete a program with an exception, very good. But what if this situation needs to be handled and continued? We need to add a mechanism to remove the "expected functions"!
For example, like this:

function Combo(finalCallback) {
	this.finalCallback = finalCallback;
	this.result = {};
	this.counter = 0;
}
Combo.prototype = {
	"add" : function (callback, id) {
		var that = this;
		if (!id) id = this.counter;
		this.counter ++;
		return function () {
			if (!that.result.hasOwnProperty(id))
			{
				that.result[id] = callback.apply(this, arguments);
				that.check();
			}
		};
	},
	"remove" : function (id, result) {
		this.result[id] = result;
		this.check();
	},
	"check" : function () {
		var that = this;
		this.counter --;
		if (this.counter == 0)
			process.nextTick(function () { 
				that.finalCallback.call(that, that.result); 
			});
	}
};

As a second, optional, parameter, the identifier is passed to the add method, by which, if necessary, you can delete the "expected function". And, in addition, the result of the “expected function” is now stored in the field with the specified identifier. The remove method receives the identifier of the “expected function” and the “result by mistake”, which will be stored in the object that stores the results.
Example:

var test = new Combo(function (result) {
	console.log("final");
	console.log("result:");
	for (var i in result) {
		console.log(" \"" + i + "\" : \"" + result[i] + "\"");
	}
});
console.log("begin");
setTimeout(test.add(function () {
	console.log("2000ms"); return "2000ms"; }), 2000);
setTimeout(test.add(function () { 
	console.log("1500ms"); return "1500ms"; }, "1500ms"), 1500);
setTimeout(test.add(function () { 
	console.log("1000ms"); return "1000ms"; }), 1000);
// error
setTimeout(function () { 
	console.log("something wrong in 1500ms on 1250ms");
	test.remove("1500ms", "error in 1500ms");
}, 1250);
console.log("end");

Result:

begin
end
1000ms
something wrong in 1500ms on 1250ms
2000ms
final
result:
 "0": "2000ms"
 "2": "1000ms"
 "1500ms": "error in 1500ms"

Conclusion

Of course, the wrapper can and should be finalized, but for me the development of this object has become a very good workout. If you do everything leisurely and thoughtfully, event-oriented programming is not so difficult.
Well, of course, do not forget about the help of the community: perhaps someone else has already solved problems similar to yours.

Tags: