JsonWriterSax - a library for creating JSON

    Some time ago I wrote an application in c ++ / Qt, which sent large amounts of data over the network in JSON format . A standard QJsonDocument was used . When introducing, I ran into poor performance, as well as an awkward class design, which did not allow for the normal detection of errors during operation. The result was the JsonWriterSax library , which allows writing JSON documents in SAX style at high speed, which I publish on github.com under the MIT license. Who cares - I ask under the cat.


    A bit of theory


    JSON (JavaScript Object Notation) is a structured text data format developed by Douglas Crockford and is a subset of the ECMAScript language (it is based on JavaScript, JScript, and others). JSON replaces XML by expanding nesting and adding data types. Currently being actively applied on the Internet.


    But there are drawbacks to JSON. In my opinion, among the standard types, the DateTime type is clearly not enough - you have to pass the value as a number or a string, and when parsing, you have to make a decision depending on the context. But It is worth noting that in ECMAScript, the Date type was created a long time ago, it was not thought out, and in the js world, third-party libraries are used to work with dates.


    There are 2 basic approaches for parsing and creating structured documents - SAX and DOM. They appeared more for XML, but can be used as patterns and for creating handlers of other formats.


    SAX (Simple API for XML)


    Used for sequential processing of data and allows you to process large documents in the stream. When reading, returns to the application information about the element or error found, but the preservation of information and control of nesting lies with the application itself. When recording, steps are usually indicated in the style: start an element, start a sub-element, write a number, write a line, close a sub-element, close an element. The disadvantages include the fact that the programmer is required to write code more carefully, to better understand the structure of the document and the absence or extreme limitation of editing an existing document.


    DOM (Document Object Model)


    With this method, a document tree is built in memory, which can be serialized, deserialized and changed. The main disadvantage is high memory consumption and an increase in processing time. Under the hood is commonly used SAX handler.


    QJsonDocument problems


    A standard QJsonDocument uses the DOM approach. When creating a document, the speed is low - you can see the benchmarks at the end of the article. But the biggest problem for me was the ill-conceived error return design.


    auto max = std::numeric_limits<int>::max();
    QJsonArray ja;
    for(auto i = 0; i < max; ++i) {
        ja.append(i);
        if(ja.size() - 1 != i) {
            break;
        }
    }

    In this example, when there is a shortage of memory, the message is written to the error stream.


    QJson: Document too large to store in data structure
    and the data will stop being added. In the case of an array, you can check the condition


    ja.size() - 1 != i

    But what to do when working with an object? Constantly check that a new key is added? Parse a log in search of an error?


    Library


    The JsonWriterSax library allows you to write a JSON document in a QTextStream in SAX style and is available on github under an MIT license. Control of memory is assigned to the application. The library controls the integrity of JSON - if an element is added incorrectly, the recording function will return an error. For control is used KS grammar. Tests were written , but perhaps some case was left without attention. If someone fixes the incorrect operation of the check and reports to correct the error, I will be very grateful.


    I believe that the best description of the library for a programmer is a sample code =)


    Examples


    Array creation


    QByteArray ba;
    QTextStream stream(&ba);
    stream.setCodec("utf-8");
    JsonWriterSax writer(stream);
    writer.writeStartArray();
    for(auto i = 0; i < 10; ++i) {
        writer.write(i);
    }
    writer.writeEndArray();
    if(writer.end()) {
        stream.flush();
    } else {
        qWarning() << "Error json";
    }

    As a result, we get


    [0,1,2,3,4,5,6,7,8,9]

    Object creation


    QByteArray ba;
    QTextStream stream(&ba);
    stream.setCodec("utf-8");
    JsonWriterSax writer(stream);
    writer.writeStartObject();
    for(auto i = 0; i < 5; ++i) {
        writer.write(QString::number(i), i);
    }
    for(auto i = 5; i < 10; ++i) {
        writer.write(QString::number(i), QString::number(i));
    }
    writer.writeKey("arr");
    writer.writeStartArray();
    writer.writeEndArray();
    writer.writeKey("o");
    writer.writeStartObject();
    writer.writeEndObject();
    writer.writeKey("n");
    writer.writeNull();
    writer.write(QString::number(11), QVariant(11));
    writer.write("dt", QVariant(QDateTime::fromMSecsSinceEpoch(10)));
    writer.writeEndObject();
    if(writer.end()) {
        stream.flush();
    } else {
        qWarning() << "Error json";
    }

    As a result, we get


    {"0":0,"1":1,"2":2,"3":3,"4":4,"5":"5","6":"6","7":"7","8":"8","9":"9","arr":[],"o":{},"n":null,"11":11,"dt":"1970-01-01T03:00:00.010"}

    Creating a document with nesting and different types


    QByteArray ba;
    QTextStream stream(&ba);
    stream.setCodec("utf-8");
    JsonWriterSax writer(stream);
    writer.writeStartArray();
    for(auto i = 0; i < 1000; ++i) {
        writer.writeStartObject();
        writer.writeKey("key");
        writer.writeStartObject();
        for(auto j = 0; j < 1000; ++j) {
            writer.write(QString::number(j), j);
        }
        writer.writeEndObject();
        writer.writeEndObject();
    }
    writer.writeEndArray();
    if(writer.end()) {
        stream.flush();
    } else {
        qWarning() << "Error json";
    }

    Benchmarks


    QBENCHMARK was used during the release build. The functionality is implemented in the JsonWriterSaxTest class .


    elementary OS 5.0 Juno, kernel 4.15.0-38-generic, cpu Intel® Core (TM) 2 Quad CPU 9550 @ 2.83GHz, 4G RAM, Qt 5.11.2 GCC 5.3.1


    Long number array


    • QJsonDocument: 42 msecs per iteration (total: 85, iterations: 2)
    • JsonWriterSax: 23 msecs per iteration (total: 93, iterations: 4)

    Big one-level object


    • QJsonDocument: 1,170 msecs per iteration (total: 1,170, iterations: 1)
    • JsonWriterSax: 53 msecs per iteration (total: 53, iterations: 1)

    Big complex document


    • QJsonDocument: 1,369 msecs per iteration (total: 1,369, iterations: 1)
    • JsonWriterSax: 463 msecs per iteration (total: 463, iterations: 1)

    elementary OS 5.0 Juno, kernel 4.15.0-38-generic, cpu Intel® Core (TM) i7-7500U CPU @ 2.70GHz, 8G RAM, Qt 5.11.2 GCC 5.3.1


    Long number array


    • QJsonDocument: 29.5 msecs per iteration (total: 118, iterations: 4)
    • JsonWriterSax: 13 msecs per iteration (total: 52, iterations: 4)

    Big one-level object


    • QJsonDocument: 485 msecs per iteration (total: 485, iterations: 1)
    • JsonWriterSax: 31 msecs per iteration (total: 62, iterations: 2)

    Big complex document


    • QJsonDocument: 734 msecs per iteration (total: 734, iterations: 1)
    • JsonWriterSax: 271 msecs per iteration (total: 271, iterations: 1)

    MS Windows 7 SP1, Intel® Core CPU (TM) i7-4770 CPU @ 3.40GHz, 8G RAM, Qt 5.11.0 GCC 5.3.0


    Long number array


    • QJsonDocument: 669 msecs per iteration (total: 669, iterations: 1)
    • JsonWriterSax: 20 msecs per iteration (total: 81, iterations: 4)

    Big one-level object


    • QJsonDocument: 1,568 msecs per iteration (total: 1,568, iterations: 1)
    • JsonWriterSax: 44 msecs per iteration (total: 88, iterations: 2)

    Big complex document


    • QJsonDocument: 1,167 msecs per iteration (total: 1,167, iterations: 1)
    • JsonWriterSax: 375 msecs per iteration (total: 375, iterations: 1)

    MS Windows 7 SP1, Intel® Core CPU (TM) i3-3220 CPU @ 3.30GHz, 8G RAM, Qt 5.11.0 GCC 5.3.0


    Long number array


    • QJsonDocument: 772 msecs per iteration (total: 772, iterations: 1)
    • JsonWriterSax: 26 msecs per iteration (total: 52, iterations: 2)

    Big one-level object


    • QJsonDocument: 2.029 msecs per iteration (total: 2.029, iterations: 1)
    • JsonWriterSax: 59 msecs per iteration (total: 59, iterations: 1)

    Big complex document


    • QJsonDocument: 1,530 msecs per iteration (total: 1,530, iterations: 1)
    • JsonWriterSax: 495 msecs per iteration (total: 495, iterations: 1)

    Perspectives


    In future versions I plan to add the ability to describe the format of user data via lambda functions with QVariant, add the ability to use delimiters to format a document (pretty document) and maybe, if the community is interested, I will add a SAX parser.


    By the way, my library helped me to find an overflow error, allowing for qInfo (), qDebug (), qWarning () to set the format and output in the style of the Python logging module . I also plan to put this library into opensource - if anyone is interested, write in the comments.


    Also popular now: