HighLoad platform for online auto parts stores

    image

    In the last topic, we promised to talk about the internal structure of our platform www.abcp.ru (SaaS solution). Today we will talk about the most interesting module of the platform - warehouses for storing price lists.

    How was it in the beginning?


    Already in the second year of the platform’s operation, we were faced with the growing needs of customers in the amount of price list lines that must be stored in a database. At this point, the data was stored simply in one table in regular MySQL. It turned out that a typical store wants to connect 7 million items of goods to its store, but cannot pay more than 5,000 rubles a month. We were not ready for such a turn either technically or economically, therefore we restrained our customers as much as we could, and at that time we developed a storage system that can withstand significant amounts of data (up to 50 billion records).

    To restrain system performance with data growth, we carried out many software optimizations and even bought a serious Hewlett-Packard StorageWorks storage system for 25 thousand euros, where we placed our MySQL database (based on the high speed of the disk system), but these improvements were not enough for a while, and time went on.

    As it turned out, the stress phase for our team will last more than two years. If we knew the actual development timeline, it would be a major blow to our motivation. But we didn’t know :) Therefore, we succeeded.

    image

    Choosing a solution


    The input was this: the system should store a lot of records (they determined that 50 billion would be more than enough), about half of this data can be updated daily. It is necessary to achieve decentralization in data storage and to allow partial degradation in case of problems with equipment.

    At that time, we only knew about HighLoad that it was very cool, that there were some conferences on this subject, and that the Internet was full of interesting articles. It was fashionable to use solutions in NoSQL projects, say words like “sharding” and all that. We also knew that Oracle is a very powerful database, which is also able to scale, and therefore considered this option.

    After much torment, we rejected:

    1. Powerful Oracle sub. Because it is expensive and it is difficult.
    2. Many different real NoSQL. Because at that time they were not reliable enough or could not withstand the requirements for a mixed read / update load.

    As a result, familiar MySQL (innoDB) won. Proven solution, free of charge, many experts. We have already stepped on all the rakes, so for our team this DBMS was ideal for creating a new version of the storage of price lists with the declared characteristics. The word “sharding” also helped us.

    And this is what we got


    The new storage system for price lists is a distributed version based on simple, inexpensive and understandable MySQL, which we use in NoSQL mode, i.e. just INSERT, UPDATE, DELETE and SELECT. Without any JOIN and with a minimum number of conditions in WHERE.

    image

    At the moment, the service stores ~ 750 million records of goods sold. During the day, sellers completely update a third of the information (~ 250 million records). The number of UPDATE operations is approximately 10 times greater than the number of SELECT operations. The rate of filling / updating prices in the service at the moment is 30,000 positions per second, which allows you to update the above ~ 250 million for 10% of the daily time.

    The scheme looks like a reference example of horizontal database scaling, but, of course, inside the new service, everything is much more complicated. For faster updating, we developed separate modules (also in sharding format) for converting the “zoo” price lists to the standard format, and separate diff services that calculate the difference between price lists (this works faster than a full update of the price list ) Another solution that completely proved itself was the introduction of the RabbitMQ server as the “heart” of the system. The very use of asynchronous queues has prompted our team to a number of ideas that can significantly speed up the system.

    Of course, we have achieved excellent economic indicators. For example, if we significantly expand the subsystem for storing positions in warehouses, then the monthly cost of storing / using one million positions will be ~ 100 rubles. Even at the current exchange rate.

    Also, choosing the most optimal equipment for the created modules, we received an additional economic gain of about 10-20%.

    If you are developing projects in the field of auto parts trade, then you can use our API to store price lists and search for them (we provide access on request).

    That's it for today, see you in the next topic!

    Also popular now: