Excel performance in pure javascript is achievable

Hi Habr!

We continue the battle for Javascript performance using the example of creating pivot tables. The last time the stumbling block was IndexedDB asynchronous interface, which, using a challenge-thread for each cursor recording works terribly slowly. Having solved this problem by organizing large-block storage, as well as applying all known optimizations, I managed to raise the application's performance by 20 times, and now the storage calculation of 1 million facts takes 21 seconds, which potentially gives hope to catch up with ~~America~~ Excel, which processes that same million lines in 5..7 seconds.

A single-pass algorithm that does not use indexes and subqueries fits perfectly onto a block data storage scheme, and, most encouraging, it allows you to parallelize the calculation across different threads (workers), essentially repeating the “adult” DBMS algorithms. Thus - the possibilities for optimization are far from exhausted. Currently, the calculation is performed by only one worker, WASM is not used, the results of the milion test on various browsers are as follows:

Browser	Time
Chomium linux	21 sec
Firefox linux	51 sec
Chrome android	29 sec
Firefox android	62 sec

The application has a generator of test data, you can also download your own JSON and check the numbers. The application is in deep beta, so the errors are not properly processed, excuse me. Under the cut - a few cases on the acceleration of WEB-applications, which, of course, all are commonplace and obvious, I just, as an amateur to learn from my mistakes, checked them, fixed them, and now I try to follow.

IndexedDB

A normal DBMS, guaranteed to be supported by all browsers, and without restrictions on the size of the database, you just need to be able to prepare it. My initial storage scheme (1 fact == 1 DB document) turned out to be unacceptable reading speed, and I was forced to switch to storing data in large blocks. The block size is chosen by compromise. Too small block - the overall calculation performance drops due to callbacks; too large block - the interface fails to respond when the block is read / written (for example, when a line is being decoded). Experimentally picked up the block size (10,000 facts == 1 DB document). The code is not very complicated - we loop through the blocks, inside the block we loop on the facts, form the full id of the fact, etc. The only problem is that within nested loops, you need to make an asynchronous call,synchronous interface , maybe it will be much faster, though, what could be faster than iterating an array in memory?

Cross Stream Javascript Calls

The postMessage () inter-thread calls are slow. And it’s not even the fact that the transmitted data is copied (just this happens quite quickly), but the call itself for some reason has serious overhead, and simply reducing the frequency of exchanges between the worker and the main stream from 100 times per second to 10th Increased application performance by almost 2 times. The problem is aggravated by a slow initialization of the worker (the JS file is read from the disk and compiled each time anew), and the impossibility of reusing the worker’s context — the browser kills the completed worker, and heavy operations (opening the database, initializing the data) must be repeated each time the stream starts. And of course - it is impossible to transfer a complex object by reference to the worker and back (except for ArrayBuffer, which does not suit us). Thus - the less often the workers start

Memory allocations

Of course, it sounds corny, but Array.push (), repeated in a cycle a million times, is just several times slower than a one-time initialization of the array let arr = new Array (1000000), and then checking in a cycle if (arr [i]! == undefined) . Even if the length of the array cannot be calculated in advance, it is wiser to create it “with a margin”, in the end, Javascript has no problems with large objects in memory.

PS I
ran into a ridiculous feature when trying to initialize an array with a non-primitive:

let a = newArray(100).fill({});
a[1] .prop = 1;
a[2] .prop = 2;
console.log(a[1].prop);   // выводит 2

The language clearly does not have enough AI to understand my obvious intentions :)

Work with DOM

If you need to insert large amounts of data (for example, rows of a table), it is better to do this in groups of 100..1000 rows - this is how we load the main thread less, and, accordingly, the interface will be more responsive. Browsers have already learned how to quickly parse large texts, but are easily hung up with frequent DOM updates, so the insertAdjacentHTML () method works on average faster than the appendChild () series.
If you need to show a large table - the container layout with TABLE or DIV tags does not allow displaying more than 10 thousand lines. Of course, you can load the tail / head of the table dynamically when scrolling, but in any case - when adding / deleting content into a block element - the browser needs to recalculate the width, that is, in essence, redraw the entire table again. Fixing the width of the columns does not help - it's still slow, and hangs up the interface. Strangely enough, the output was found in the use of the PRE container, which allowed to output more than 100 thousand lines to decrypt, and even work with the table (scrolling, searching, scaling) in the process of loading the “tail”. The only inconvenience is that when formatting a table with a monospace font, you need to know in advance the maximum width of each column (values are padded with spaces).

MVC pattern violation

Unfortunately, high performance is incompatible with some programming patterns that share the level of data, business logic, and presentation. If real speed is needed, functions should not be divided, but combined, especially in the absence of a normal query language, when all calculations are performed in cycles and recursions. For example, the width of the column for display and the predominant type of data in this column is exactly the view level, but in order to avoid repeated cycles, the calculation of these characteristics is combined with the calculation of aggregates (the level of business logic), that is, the pattern is clearly broken. This is not the only example - the division of functions into layers involves the exchange of data between layers, and at each level the data must be transformed / packaged / unpacked, which makes up the lion's share of brakes when using frameworks.~~to the people the~~ original data format, and if this initial format is not optimal - it is more logical (for performance purposes) to change / normalize / denormalize the storage scheme than to try to correct the situation in the intermediate layers. Of course, the use of patterns is a religious question, it depends heavily on the project, and all of the above makes sense if you write your program, and not someone else's.

Prospects for further optimization

Parallel computations. Single-pass algorithm plus block data storage - perfectly parallel. Select the pool of workers, each worker works with his own range of blocks, wait for everyone in the main thread and collect the result. This part of the code is still in operation, but even the acceleration of calculation is 5 times closer to our cherished goal - to catch up with Excel.
Wasm. Of course, it is tempting to rewrite the calculation algorithm for wasm, only I am not sure what will help. The bottleneck of the calculation is the speed of the Map, Set, Array objects, and why should these objects work faster in wasm? Traditionally, the transition to wasm is justified by the compilation speed, but in this project there are only 1,500 lines, is that a lot?

Summary

I am delighted with the possibilities of WEB-applications, from the language of Javascript, and I think that despite the cool attitude of the habrovchan to this issue - PWA may well take over the world. If they want.

PS
Continued : Accelerated due to multi-threaded computing.

Tags: