Severe optimization of work with market data for cryptobirds

During the refactoring of our cryptobirds, it was decided to revise the concept of working with the market date. In the classic case, the market date is distributed in two ways:

1. REST interface;
2. WEBSocket broadcast subscription.

The REST method is often used to obtain historical data, while up-to-date information is sent via WEBSocket online. In some cases, WEBSocket is not used at all, and the update occurs with regular requests via REST.

And everyone seems to be happy. But, on a closer look, the huge overhead of such a concept becomes apparent. Their bulk lies on the REST. To ensure the functioning of the REST interface, we must create a backend that meets the requirements of high-load systems. Naturally, here you can choose different solutions from PHP to the now fashionable Golang.

You also need to create a highly available infrastructure, implement such trifles as CI / CD for services, provide all this with the necessary specialists for development, maintenance, etc., etc.

At the same time, it is the historical market date that occupies the bulk of the disk space. It is usually stored in the database. This, on the one hand, allows us to solve the clustering problem, but at critical sizes, it becomes an unbearable burden and poses non-trivial tasks for the DevOps team and developers.

In general ... the seeming simplicity and consistency of this concept is smashed to smithereens by the harsh life.

Separately, you need to note the feature of the market date. It always accumulates (grows). Those. in the language of the DB programmer, we always insert and then select. But do not divide. Those. excellent database functionality for organizing, optimizing, etc. The stored data becomes unclaimed.

Another important property of the market date is its clearly defined structure. For example, a candle in a chart has only eight parameters:

1. A moment in time;
2. Exposure;
3. Maximum price;
4. The minimum price;
5. Opening price;
6. Closing price;
7. Volume;
8. Average price.

The same can be said about transactions.

Based on these prerequisites, as well as armed with experience, I quickly came to the conclusion that it is more correct to treat the market date as a structured stream.

For example, a stream with candles can be considered as an array of binary structures: Total: 56 bytes. For example, such a candle in JSON will be described as:

moment: int32
exposition: int32
min: float64
max: float64
open: float64
close: float64
volume: float64
average: float64


Total: 167 bytes.

Profit in size is obvious. And this is less traffic, higher speed of delivery to the client.

And then, of course, BSON comes to mind. But it does not solve the problem of having a backend and, in general, does not solve the problem fundamentally. In addition, it is not natively supported by browsers.

I looked the other way. The network has a lot of resources working with threads. These are audio and video resources. They demonstrate all the signs that are needed:

  1. work with large amounts of data;
  2. copes with high loads;
  3. capable of delivering content online, but at the same time, they provide an opportunity to access historical data

I dived a little into streaming video, which allowed us to fundamentally solve all the above mentioned problems by market date. All the magic, as it turned out, is hidden in the Content-Range technology  that is supported out of the box, for example, Nginx. It allows you to access any area of ​​a static resource (file on the server) without using the backend. In general, it happens this way: you refer to the URL by indicating in the title the exposure you want to return. For example: range: 100-200. I will not dwell on the intricacies, all the nuances of technology you can find in the relevant articles. I think the point is clear.

In fact, now, on the front side, you can organize an appeal to the necessary part of the file, for example, containing candles. And since it is precisely known how many bytes a single candle takes (56 bytes), we can easily determine the offset we need. True, we still need to know the point in time from which the schedule begins. And this is quite easily solved by adding a header to the file, the size of which is also a constant.

Those. First of all, the front accesses the file from zero position, and gets the
header. At the same time, nginx will give the file size. This will allow you to determine the total number of candles in the file and the starting point.

Now, knowing the starting point of time, having a clear idea of ​​the number of candles, we can get any number of them for any period of time from the front, without using the backend.

Oh, yes ... another moment ... we have such a parameter as the candle exposure. Here the solution is also simple - we keep several files at once for different exposures. As an additional small bonus, the size of the candlestick structure is reduced by another 4 bytes.

In general, the decision was already quite interesting for implementation, but it turned out still, I am not ashamed to say - very steep profits. The fact is that browsers can cache data obtained by the range method. Those. at the next request of the front to the server, if this portion of the file was received by the browser earlier, it will not get to your server, but will take it from the cache.

But even that is not all. Using CDN, caching can also be configured by its means. Those. according to the results, it is possible to obtain a system that is capable of delivering large volumes of market dates to the minimum loading infrastructure and servers.

Needless to say, there was no longer any doubt about the loyalty of the idea? Now there are mere trifles ... you need to make the generation of these same files.

As mentioned above, usually, the exchange exploits two market-date delivery systems: REST and WEBSocket. The latter broadcasts online the current market date. Usually this is a separate service. As practice has shown, the refinement of this service so that it appends data to the necessary files is not difficult and is solved literally by a couple of developer hours. We can say that it simultaneously with the distribution list keeps a log.

The issue of delivering files to nodes is a classic file system synchronization problem. DevOps team solves it easily and naturally. For example using rsync.

Now, we can safely say - BINGO!

Perhaps a question arises - why do all crypto exchanges do otherwise? I have no reliable answer to this question. But there are thoughts:

  1. the exchanges were created by developers who sympathize with the crypt. Perhaps they did not have an idea about the work of classical stock exchanges, did not take into account their experience and used easily accessible solutions to get quick results. And this is sample solutions: the same REST and accompanying JSON;
  2. as they say in people - Does it work? Do not touch. - Since technology exchanges have already been created by key exchanges, and emerging markets have borrowed them.

If the topic turns out to be interesting to the community, I will continue with articles about other non-standard solutions implemented on our project. How it works in practice, you can see on the website of the exchange (see my profile). Especially I suggest to pay attention to the work of the schedule, which clearly demonstrates all the profits of this technology.

Also popular now: