akzhan August 27, 2010 at 17:24

Node.JS - generate the resulting document using other HTTP sources

Often, servers on Node.JS are used as aggregator services that receive dynamic data from other HTTP sources and form an aggregated response based on this data.

To process the received data, it is convenient to use external processes that process the original set of files (for example, ImageMagick or ffmpeg utilities).

Consider this using the example of an HTTP server acting as a backend for a nginx server and generating CSS sprites for a set of images.

Asynchronous Read / Write

Client Connection Pool

The HTTP client objects in Node.JS work each with one TCP connection, making requests in turn, so we need to organize a client pool (a trade-off between creating connections for each one and using one connection) if we want to work really fast ( parallel).

We will make the most primitive pool from the assumption that we send all initial requests to example.com : 80.

var ClientPool = function() { this.poolSize = 0; this.freeClients = []; }; ClientPool.prototype.needClient = function() { this.freeClients.push(this.newClient()); this.poolSize++; }; ClientPool.prototype.newClient = function() { return http.createClient(80, 'example.com'); }; ClientPool.prototype.request = function(method, url, headers) { if (this.freeClients.length == 0) { this.needClient(); } var client = this.freeClients.pop(); var req = client.request(method, url, headers); return req; }; ClientPool.prototype.returnToPool = function(client) { this.freeClients.push(client); }; var clientPool = new ClientPool(); * This source code was highlighted with Source Code Highlighter.

If you wish, you can change the architecture of the pool by allowing connections to multiple hosts, as well as limiting the pool size from above (while scattering requests for the least loaded connections). I'll leave it as my homework.

Retrieving and saving a file

We need an asynchronous function to execute HTTP requests and save the contents to a file. Its peculiarity is that two streams of asynchronous operations are executed at once - reading the original HTTP stream and writing to a file. Moreover, we can successfully close the file and call the callback function only upon completion of all write operations that may not necessarily be performed sequentially.

Here is an example implementation:

var getFile = function(url, path, callback) { fs.open(path, 'w', 0600, function(err, fd) { if (err) { callback(err); return; } var request = clientPool.request('GET', url, { 'Host': 'example.com' }); request.on('response', function(sourceResponse) { var statusCode = parseInt(sourceResponse.statusCode); if (statusCode < 200 || statusCode > 299) { sourceResponse.on('end', function() { clientPool.returnToPool(sourceResponse.client); }); callback('Bad status code'); return; } var writeErr = null; var writesPending = 0; var sourceEnded = false; var checkPendingCallback = function() { if (!sourceEnded || writesPending > 0) { return; } fs.close(fd, function(err) { err = err ? err : writeErr; if (err) { removeFile(path); callback(err); return; } // No errors and all written callback(null); }); }; var position = 0; sourceResponse.on('data', function(chunk) { writesPending++; fs.write(fd, chunk, 0, chunk.length, position, function(err, written) { writesPending--; if (err) { writeErr = err; } checkPendingCallback(); }); position += chunk.length; }); sourceResponse.on('end', function() { sourceEnded = true; checkPendingCallback(); clientPool.returnToPool(sourceResponse.client); }); }); request.end(); }); }; * This source code was highlighted with Source Code Highlighter.

The mechanism of interaction between nginx and our server

In order not to generate sprites for each request, we will save the output sprites, deleting the oldest of them, for example, by crown. If the file already exists, nginx will return it according to the try_files rule. Otherwise, the request will be redirected to our backend, which will create the desired file, and using X-Accel-Redirect will ask nginx to return the file from the internal location, which leads to the same physical space.

In this case, the nginx configuration will look something like this:

    upstream sprite_gen {
        server 127.0.0.1:14239;
    }
    location /out_folder/ {
        alias /var/sprite-gen/out_folder/;
        internal;
    }
    location / {
        alias /var/sprite-gen/out_folder/;
        try_files $uri @transcoder;
    }
    location @transcoder {
        proxy_pass  http://sprite_gen;
    }

This example does not claim to be perfect, with its help it is good to give large files, including in parts, with caching.

If the files are small, and it is better for us to control the regeneration of sprites with missing images, then it is more correct to cache on the nginx side with a rule of the form proxy_no_cache $ http_pragma.

We get some files

Here is a fragment of the HTTP server responsible for obtaining a set of files, creating a sprite and returning the result for nginx.

var outPath = ''; // Куда кладём результирующий спрайт var imageUrls = []; // Здесь список путей для исходных изображений. var images = []; // Здесь список путей для исходных изображений. var waitCounter = images.length; var needCache = true; // если хоть одно изображение отсутствует, и заменено плейсхолдером, то выключаем кэширование var handlePart = function(url, pth) { getFile(url, pth, function(err) { waitCounter--; if (err) { removeFile(pth); var pth = placeholder_path; needCache = false; } if (waitCounter == 0) { makeSprite(images, outPath, function(err) { if (err) { response.writeHead(500, { 'Content-Type': 'text/plain', }); response.end('Trouble'); return; } var headers = { 'Content-Type': 'image/png', 'X-Accel-Redirect': outUrl }; if (needCache) { headers['Cache-Control'] = 'max-age:315360000, public'; headers['Expires'] = 'Thu, 31 Dec 2037 23:55:55 GMT'; } else { headers['Cache-Control'] = 'no-cache, no-store'; headers['Pragma'] = 'no-cache'; } response.writeHead(200, headers); response.end(); }); } }); }; for (var i = 0; i < imageUrls.length) { handlePart(imageUrls[i], images[i]); } * This source code was highlighted with Source Code Highlighter.

We generate the output file through an external process

Controlling external processes with Node.JS is easy and convenient. For the convenience of debugging, we will copy the output generated by the external process to our console. To form the sprite, we will choose the GraphicsMagick package (ImageMagick fork with a stable API and good performance).

var spriteScript = '/usr/bin/gm'; var placeholder = path.join(__dirname, 'placeholder.jpg'); var getParams = function(count) { return ('montage +frame +shadow +label -background #000000 -tile ' + count + 'x1 -geometry +0+0').split(' '); }; var removeFile = function(path) { fs.unlink(path, function(err) { if (err) { console.log('Cannot remove ' + path); } }); }; var cleanup = function(inPaths, placeholder) { for (var i = 0; i < inPaths.length; i++) { if (inPaths[i] == placeholder) { continue; } removeFile(inPaths[i]); } }; var makeSprite = function(inPaths, outPath, placeholder, callback) { var para = getParams(inPaths.length).concat(inPaths, outPath); console.log(['run', spriteScript, para.join(' ')].join(' ')); var spriter = child_process.spawn(spriteScript, para); spriter.stderr.addListener('data', function(data) { console.log(data); }); spriter.stdout.addListener('data', function(data) { console.log(data); }); spriter.addListener('exit', function(code, signal) { if (signal != null) { callback('Internal Server Error - Interrupted by signal' + signal.toString()); return; } if (code != 0) { callback('Internal Server Error - Code is ' + code.toString()); return; } cleanup(inPaths, placeholder); callback(null); }); }; * This source code was highlighted with Source Code Highlighter.

Small nuances

We form a name for the temporary file

It is better to use Process.pid and the request counter (for example, as path.join ('/ tmp', ['source-file', Process.pid, requestCounter] .join ('-')) to generate the file name. In this case, the function request processing should receive a request counter as an argument, since the processing of the next request may begin earlier than all the steps for executing the current request are completed.

We clear temporary data from past processes

Let all our temporary files be named source-pid ... or sprite-pid- ...:

var fileExpr = /^(?:source|sprite)\-(\d+)\b/; var storagePath = '/tmp/'; var cleanupOldFiles = function() { fs.readdir(storagePath, function(err, files) { if (err) { console.log('Cannot read ' + storagePath + ' directory.'; return; } for (var i = 0; i < files.length; i++) { var fn = files[i]; m = fileExpr.exec(fn); if (!m) { continue; } var pid = parseInt(m[1]); if (pid == process.pid) { continue; } removeFile(path.join(storagePath, fn)); } }); }; * This source code was highlighted with Source Code Highlighter.

Request Processing Skeleton

Suppose we want to get a photo album sprite from some point in time (timespec).

#!/usr/bin/env node var child_process = require('child_process'); var http = require('http'); var path = require('path'); var fs = require('fs'); var routeExpr = /^\/?(\w)\/([^\/]+)\/(\d+)\/(\d+)x(\d+)\.png$/; var fileCounter = 0; http.createServer(function(request, response) { if (request.method != 'GET') { response.writeHead(405, {'Content-Type': 'text/plain'}); response.end('Method Not Allowed'); return; } var m = routeExpr.exec(request.url); if (!m) { response.writeHead(400, {'Content-Type': 'text/plain'}); response.end('Bad Request'); return; } var mode = m[1]; var chapter = m[2]; var timespec = parseInt(m[3]); var width = parseInt(m[4]); var height = parseInt(m[5]); fileCounter++; var moments = [timespec]; addWantedMoments(moments, mode) var runner = function(moments, fileCounter, width, height) { var waitCounter = moments.length; var outPath = path.join(storagePath, ['sprite', process.pid, fileCounter].join('-') + '.png'); var needCache = true; for (var i = 0; i < moments.length; i++) { handlePart(i, placeholder); } }; request.connection.setTimeout(0); runner([].concat(moments), fileCounter, width, height); }).listen(8080, '127.0.0.1'); console.log('Server running at 127.0.0.1:8080'); cleanupOldFiles(); * This source code was highlighted with Source Code Highlighter.

Actually, now we have a ready-made application that forms a sprite, as an aggregated result for a set of requests to other sites.

It remains to add specifics (algorithms for obtaining links to source images, forming placeholders, if the sizes are constantly changing), and this can be used.

Actually, one of my mini-applications also acts as a dynamic sprite generator.

Node.JS - generate the resulting document using other HTTP sources
Node.JS - Fundamentals of Asynchronous Programming, Part 1

Tags: