The new Yandex.Disk synchronization algorithm: how not to choke on 900,000 files

    Yandex.Disk is one of the few Yandex services, of which desktop software is part. And one of its most important components is the algorithm for synchronizing local files with their copy in the cloud. Recently, we had to completely change it. If the old version could hardly digest even several tens of thousands of files and, moreover, did not respond quickly enough to some “complex” user actions, the new version, using the same resources, can handle hundreds of thousands of files.

    In this post I will tell you why it happened: what we could not foresee when we came up with the first version of Yandex.Disk software, and how we created a new one.



    First of all, about the synchronization task itself. Technically speaking, it consists in having the same set of files in the Yandex.Disk folder on the user's computer and in the cloud. That is, user actions such as renaming, deleting, copying, adding and changing files should be synchronized with the cloud automatically.

    Why is it not as simple as it seems at first glance?


    Theoretically, the task may seem quite simple, but in reality we are faced with various difficult situations. For example, a person renamed a folder on his computer, we detected this and sent a command to the backend. However, none of the users waits until the backend confirms the renaming success. A person immediately opens his locally renamed folder, creates a subfolder in it, and, for example, transfers part of the files to it. We are in a situation in which it is impossible to immediately perform all the necessary synchronization operations in the cloud. First you need to wait for the completion of the first operation and only then you can continue.

    The situation can become even more complicated if several users are working with the same account at the same time or if they have a shared folder. And this happens quite often in organizations using Yandex.Disk. Imagine that in the previous example, at the moment when we received confirmation from the backend of the first renaming, another user takes and renames this folder again. In this case, again, you cannot immediately perform the actions that the first user has already completed on his computer. The folder in which he worked locally on the backend at this time is already called differently.

    There are times when a file on a user's computer cannot be named exactly as it is called in the cloud. This can happen if the name contains a character that cannot be used by the local file system, or when the user is invited to a shared folder, and he has his own folder with that name. In such cases, we have to use local aliases and track their relationship with objects in the cloud.

    Previous version of the algorithm


    In the previous version of Yandex.Disk desktop software, a tree comparison algorithm was used to search for changes. Any other solution at that time did not allow the search for movements and renaming, since the backend did not have unique identifiers for objects.

    In this version of the algorithm, we used three main trees: local (Local Index), cloud (Remote Index) and the last synchronized (Stable Index). In addition, to prevent the re-generation of already queued synchronization operations, two additional auxiliary trees were used: the local expected and the cloud expected (Expected Remote Index and Expected Local Index). In these auxiliary trees, the expected state of the local file system and the cloud was stored after all synchronization operations that were already queued.

    The procedure for comparing trees in the old algorithm was as follows:

    1. If the local expected tree and the cloud expected tree are empty, initialize them by copying the last synchronized tree;
    2. We compare the local tree with the expected cloud and, based on the results of comparing individual nodes, add synchronization operations in the cloud to the queue (creating collections, transferring files to the cloud, moving and deleting in the cloud);
    3. For all operations that are queued in the previous step, we fix their future effect in the expected cloud tree;
    4. We compare the cloud tree with the local expected one and, based on the results of comparing individual nodes, add to the queue synchronization operations with the local file system (creating directories, downloading files from the cloud, moving and deleting local files and directories);
    5. For all operations that are queued in the previous step, we fix their future effect in the expected local tree;
    6. If simultaneous operations with the same file or directory fall into the queue (for example, transferring a file to the cloud and downloading the same file from the cloud), then we fix the conflict - the file has changed in two places;
    7. After the synchronization operation is performed in the cloud or with the local file system, we record its result in the last synchronized tree;
    8. When the synchronization operation queue becomes empty, we delete the local expected and cloud expected trees. Synchronization is complete, and we will no longer need them.

    Why did we have to come up with a new algorithm


    The main problems of the tree comparison algorithm were the large memory consumption and the need to compare the whole trees even with small changes, which led to a large processor load. During the processing of changes to even one file, the use of RAM increased by approximately 35%. Suppose a user had 20,000 files. Then, with a simple renaming of a single file of 10Kb in size, memory consumption grew spasmodically - from 116MB to 167MB.

    We also wanted to increase the maximum number of files with which the user can work without problems. Several tens and even hundreds of thousands of files may be, for example, a photographer who stores photosession results in Yandex.Disk. This task became especially urgent when people had the opportunity to buy additional space on Yandex.Disk.

    In the development, I also wanted to change something. Debugging the old version caused difficulties, since the data on the states of one element were in different trees.

    By this time, id of objects appeared on the backend, with the help of which it was possible to more efficiently solve the problem of detecting movements - previously we used paths.

    New algorithm


    We decided to change the data storage structure and replace the three trees (Local Index, Remote Index, Stable Index) with one, which was supposed to reduce the redundancy in the main data structure. Due to the fact that the key to the tree is the path to the file system element, as a result of combining, the amount of RAM used has been significantly reduced.

    We also refused to use auxiliary trees during synchronization, because each element of the tree in the new version stores all the necessary data. This change in structure greatly simplified code debugging.

    Since we understood that this was a major change, we created a prototype that confirmed the effectiveness of the new solution. Let's look at an example of how the data in the tree changes during synchronization of a new file.

    1. After the user added a new file to the Drive folder, the program detected it and added a new element to the tree. Only one state is known for this element - local. Since there are no stable and remote states, no memory is allocated for them;
    2. The program uploads the file. A push comes from the cloud confirming the appearance of a new file, and a remote state is added to the tree;
    3. The states of local and remote are compared. Since they match, a stable state is added;
    4. The local and remote states are deleted. They are no longer needed, since all the information is in stable.


    This example shows that in the new synchronization algorithm only those elements and events are processed, the data on the changes in which were received from the file system or the cloud, and not the whole tree, as it was before. If necessary, parent or child nodes will be processed (for example, in the case of moving a folder).

    Other improvements


    In the new version, we worked on other improvements that affected performance. Saving the tree was made incremental, which allows you to write to the file only the latest changes.

    Yandex.Disk uses sha256 and MD5 digests to check file integrity, detect changed fragments and deduplicate files on the backend. Since this task loads the CPU heavily, in the new version the implementation of digest calculations was significantly optimized. The speed of receiving a file digest is approximately doubled.

    Figures


    Synchronization of unique 20,000 files of 10Kb
    Software versionDownload to CPU.
    Digest Calculation
    CPU
    upload load
    RAM usage, Mb
    Yandex.Disk 1.3.328% (1 core 100%)About 1%102
    Yandex.Disk 1.2.748% (2 cores 100%)About 10%368

    Calculation of digests of unique 20,000 files of 10kb (indexing)
    Software versionCPU loadTime, secRAM usage, Mb
    Yandex.Disk 1.3.325% (1 core 100%)19082
    Yandex.Disk 1.2.750% (2 cores 100%)200245

    Starting from 20,000 synchronized files of 10Kb
    Software versionCPU loadTime, secRAM usage, Mb
    Yandex.Disk 1.3.325% (1 core 100%)1055
    Yandex.Disk 1.2.750% (2 cores 100%)22125

    Upload 1Gb. 10 Mbps Wi-Fi Connection
    Software versionCPU loadTime, sec
    Yandex.Disk 1.3.35%1106
    Yandex.Disk 1.2.75%2530

    What happened


    It can be seen from the examples that the new version of Yandex.Disk software uses about 3 times less RAM and about 2 times less CPU load. Processing minor changes does not increase the amount of memory used.

    As a result of the changes made, the number of files with which the program copes without problems has increased significantly. On the Windows version, 300,000, and on Mac OS X, 900,000 files.

    Also popular now: