Node.JS and uploading a directory from 1C to the site

    The other day, another project was closed. The bottom line: creating a new version of the online catalog. The old version of the site, for several reasons, did not suit the client. A feature of the project was its nomenclature base. The volume of the catalog nomenclature was ~ 26,000 items scattered across a tree of 513 nodes + product characteristics. Almost every nomenclature item had a 1-2K description of the text.

    The catalog upload file in ComerceML 2 format for the old site weighed 104 MB. It was formed on the 1C side for 10 minutes and after being transferred to the hosting, it was parsed on the site side for an hour and a half (!) With 100% CPU utilization.

    Way out


    As an alternative to the XML format, they decided to upload it to JSON. The idea was to try to parse JSON with something that has a native implementation of the parser, namely node.js with its JSON.parse ().

    Having figured out a new format for it, our 1C nickname achieved several iterations that unloading 1C formed valid JSON. Unloading formation time was reduced from 10 to 3.5 minutes. The same data that occupied 104 megabytes in XML format fit in 58 megabytes of JSON. But this was expected, another was unexpected ...

    To test the time of unloading parsing, I sketched the test code:

    // Node.jsvar fs = require('fs');
    functionparser(filename, callback){
        fs.readFile(filename, { encoding:'utf8' }, function(err, dataFromFile){
            var parsedData;
            if(err){
                callback(err);
            } else {
                try {
                    console.time('parse data'); // парсинг - синхронная операция...
                    parsedData = JSON.parse(dataFromFile.toString().trim()); // <- собственно сам парсинг.console.timeEnd('parse data'); // ... поэтому измеряем задержку "в лоб".
                    callback(null, parsedData ); 
                }
                catch (e){
                    callback(e)
                }
            }
        });
    }
    parser('../import/import.json', function(err, data){
        if(err){
            throw (err);
        }
        console.log('groups', data.groups.length);
        console.log('items', data.items.length);
        console.log('properties', data.properties.length);
    });
    

    Having started it on my machine (CPU 3.3GHz), I did not even have time to get up to go for tea. The result and the speed with which it was displayed in the console made me assume that the code was a bug and it did not work correctly ...
    > node parse.js

    parse data: 718ms
    groups 513
    items 26098
    properties 149

    But it was not a bug. The data was indeed parsed and stored in memory in ⅔ seconds. The number of elements in the collections completely coincided with the declared amount in 1C. All that was left was to find the fallen jaw under the table and write a service with a complete data processing cycle.

    General architecture of the unload processing service


    In general, the upload to the site works according to the standard scheme:
    • formation of unloading from 1C and its packing by the archiver;
    • upload generated files via FTP;
    • call HTTP upload handler;


    The unloading processor service is implemented according to the scheme:
    1. unpack the archive;
    2. parse JSON;
    3. report in the HTTP response that everything is in order or an error has occurred;
    4. if all is well - set the flag of employment and fill the data into the database until the end;
    5. die clearing memory.
    6. ...
    7. To be reborn as a new process - Monit is responsible for this .


    At production ( DigitalOcean, tariff for $ 10 ), from the time of the call to point 3, the service works out in general after 3-4 seconds, after which the service’s repeated call will return the busy flag while the base is being filled. The entire unloading processing cycle with entering data into the database is 80 - 90 seconds. CPU utilization at the time of parsing looks like a single peak up to 70% with a base of 10 - 30%.

    Eventually:
    • unloading formation time was reduced from 10 to 3.5 minutes;
    • upload volume decreased from 104 to 58 megabytes (1.5 megabytes after archiving);
    • the total processing time for unloading on the server side was reduced from one and a half hours to one and a half minutes;
    • ???????
    • PROFIT


    PS A remedy for headache during debugging.


    For all its speed, JSON.parse () is very inconvenient for debugging. If there is an error in the JSON structure, you get almost zero debugging information. While your 1C specialist is mastering JSON, the JSON Lint module helps a lot . It can be used as a standalone utility or as a library. Unlike a regular parser, it reports the line number of the JSON file to the exception object where a misunderstanding occurred, which, when parsing jambs in a file of tens of megabytes, drastically makes life easier. The price for such convenience is speed. It will fall 5-7 times compared to native JSON.parse ().

    The same test code with JSON Lint will look like this:

    // Node.jsvar fs = require('fs'),
        jsonlint = require("jsonlint"); // Очень полезен на этапе отладкиfunctionparser(filename, callback){
        fs.readFile(filename, { encoding:'utf8' }, function(err, dataFromFile){
            var parsedData;
            if(err){
                callback(err);
            } else {
                try {
                    console.time('parse data'); // парсинг - синхронная операция.../* Используем Jsonlint если нам нужна более подробная информация о месте ошибки в структуре JSON.
                    * Работает медленнее в 5-7 раз чем нативный JSON.parse().
                     */
                    parsedData = jsonlint.parse(dataFromFile);  // тот же парсинг, но с дебагом и поэтессами.console.timeEnd('parse data'); // ... поэтому измеряем задержку "в лоб".
                    callback(null, parsedData );
                }
                catch (e){
                    callback(e)
                }
            }
        });
    }
    parser('../import/import.json', function(err, data){
        if(err){
            throw (err);
        }
        console.log('groups', data.groups.length);
        console.log('items', data.items.length);
        console.log('properties', data.properties.length);
    });
    


    In conclusion, I want to traditionally wish that this material was useful to someone else as well as was useful to us.

    Also popular now: