Collaborative editing. Part 1

    Good afternoon. Last year, I have been working in the MyOffice project on collaborative editing. Looking back, I can say that this is not an easy and very interesting task. Therefore, I would like to talk in detail about it and give answers to the following questions:

    1. What are the approaches to collaborative editing?
    2. How difficult are they to implement?
    3. Is it possible to take a ready-made library and use it in my project?
    4. Is it possible to conduct development without regard to co-editing?



    In order to give detailed and reasoned answers to them, you need to write a lot of material, so there will be several articles, sit back, we begin.

    As soon as we begin to store files on the server (in the cloud), there is a natural desire to ensure that they are edited by several people. This is especially true for office documents, on which several people are working at once and each of them makes changes.

    But "you can’t just take it and ..." give everyone access to edit one document. A mechanism is needed that will provide convenient and correct modification of one file by several users. Let's look at options for how to do this.

    Possible approaches


    Lock the entire document while editing


    This is the most trivial method. The advantages of this solution are ease of implementation and the ability to work with all types of data. On this the advantages end and the continuous inconvenience for the user begins:

    • Only one user can make edits at a time. The more such a document will be and the more often it will be edited, the more obvious this inconvenience becomes. Imagine TK with over a hundred pages, while one user is forced to wait until his colleague finishes editing a section in another part of the document.
    • Each change in the document leads to the need to completely transfer it to the server, and then download it to all participants, which greatly affects the speed of work and increases the amount of data sent.
    • This solution cannot provide offline work, because in order to start editing a document, it is required to block it on the server from changes by other users. And without a connection to the server, this cannot be done.
    • Unblocked lock. Imagine that one user forgot to unlock, and even went home or went on vacation altogether. Everyone else has to wait or look for a negligent user. Part of this can be avoided by monitoring the time of activity or by giving the administrator the ability to forcefully remove the lock. However, neither one nor the other is an ideal solution, since in the first case it is impossible to understand on the basis of what it is possible to distinguish the missing Internet from the lack of discipline of the user, and in the second one can not do without human intervention and someone’s changes may be lost.

    This option can often be found in solutions for corporate portals, ERP, EDS.

    Lock part of a document


    For a text document, a paragraph or a table can become such a division unit. This improves the whole process of work, namely:

    • Now you can simultaneously edit one document, provided that the changes are made in different parts of it.
    • You cannot completely block the document. Despite the unblocked lock, you can continue to work with the rest of the document.

    But this approach cannot be called ideal:

    • Offline work is still not possible. To block any part of the document, you must contact the server.
    • Part of the document that was locked while editing can be significant. Indeed, a paragraph can be quite large and contain several hundred or thousand characters of text. Even more difficulties arise when working with tables. Without additional tricks, you cannot simultaneously allow changing the contents of cells and the structure of the table — insert or delete rows or columns. This means that any change to one of the cells requires blocking the entire table. In office documents, tables can be of different levels of nesting or of very large size. This moment becomes especially unpleasant when working with spreadsheet documents, where the worksheet consists of one table at all, which means that it must be blocked completely with any changes.
    • From a structural point of view, a document can be simplified as a sequence of paragraphs and tables. And when we insert or delete tables and paragraphs, we edit this sequence. And this means that the correctness of joint editing of such a sequence also needs to be ensured (by the way, there is no fundamental difference between editing a list of paragraphs or a list of letters, unless the first happens less often). Without introducing a fundamentally different synchronization method, it should again be a lock, only this time the lock object will be one for the entire document.

    Some readers may notice that the task of co-editing is successfully solved in version control systems, with a consistent history (SVN or CVS) or with the possibility of branching (GIT, Mercurial). Yes, you are right, but here there are some nuances that make it impossible to apply the same principle in our case:

    • Merge Firstly, for office suite users, this concept is generally unfamiliar. Secondly, combining different versions of a document becomes fundamentally more difficult when moving from simple text to a document with a complex (usually hierarchical) structure, where there is text, nested tables, images, and various formatting settings. Just imagine a similar 3-way merge.
    • Even when using kdiff3 or winmerge, errors occur, but they work with plain text.
    • It will not be easy for an ordinary user of office applications to delve into the world of branching and merging. And having delved into it, it will not be very fast and convenient to use.

    Look from the other side


    We will proceed from the fact that from the point of view of the user, all work with the document should be reduced only to correcting the text, inserting the schedule promised to colleagues, and finishing this. Just as he did before with local documents. All complex processes to maintain the correct state of the document must be hidden from the user. It is important to maintain responsiveness and functionality comparable to editing local files.

    For us, this dictates additional requirements:

    1. Using an optimistic strategy for making changes. Edits should be applied immediately to the local copy of the document. And regardless of this, they are sent to the server, and from it to other clients, while how quickly they are received and applied does not affect the speed of working with the local version.

    It follows that the status of the document at the clients and on the server may differ. This forms the following requirement:

    2. Convergence. When all users finish their work and all notifications about edits will be delivered, both on the server and the clients should have the same state of the document.

    But that is not all. The first two requirements can be fulfilled with the help of a simple but obviously not satisfying user algorithm. If two competing changes arrive on the server, then you can simply ignore what came later by notifying clients accordingly. This will be enough to satisfy the first two requirements, but obviously not enough to make happy whose change has been thrown out. And this means that you need to declare one more requirement:

    3. The principle of conservation of intentions (intention preservation). We must ensure the maximum preservation of the intentions of all users, even if edits are made at the same time and they compete with each other.

    We declare that each edit from each of the users is important for us and we will not cancel it unnecessarily. The need to cancel may still arise. For example, in situations where one user has edited a paragraph that has been deleted by someone else in parallel. In this case, the changes simply cannot be applied, since the paragraph no longer exists.

    The second point worth mentioning in the context of this principle is formalization. The concept of "intention" is quite abstract. Imagine that the text contains the word “custody”, which is simultaneously corrected by two users, and in different ways: “pharmacy” and “optics”. Most of the well-known algorithms (and ours too) work at the letter level, and the result is an “aptica”, which does not correspond to the “high-level” intentions of both authors. There are formalizations of user intentions at the level of weak letter orders (“I want to insert the letter“ and ”after the letter“ t ”, but before the“ k ””). For some algorithms, the preservation of intentions expressed in this way is an integral part of them (this can be read here ).

    In our case, we do not formalize the intentions of users and moreover do not give strict guarantees of their observance, but declare that we will adhere to this principle as much as possible.

    Choosing a Nonblocking Algorithm


    There are two algorithms that satisfy all our requirements: optimistic changes, convergence and preservation of intentions.

    First. Differential synchronization. Which in the end did not suit us


    The algorithm assumes a constant 3-way merge on the client and server side, which is accompanied by the transfer of patches in both directions. The diagram shows a general idea.


    A more detailed description can be seen on the author's website .

    You have probably guessed for what reason this algorithm does not suit us. Correctly, as already mentioned, a 3-way merge for a document with a complex structure is nontrivial. At the same time, it is necessary to have a formal guarantee that it will give the same results when merging in different orders and directions ... This puts an end to the prospect of applying this approach.

    Second. Operation Transformation (OT)


    It is rather a general approach, on the basis of which various algorithms are developed.

    OT is based on a fairly simple idea. We describe all data changes as operations that we forward and transform relative to others without the document itself. When an operation comes from one user to another, we transform it so that it becomes valid with respect to the difference between the documents of these two users, expressed in operations. It sounds abstruse, but in fact the principle is quite simple.

    What is Operation Transformation


    Let's look at another scheme that is popular in OT articles.



    Operation O2:
    • User 1 receives operation O2 del [2, “c”] from user 2.
    • Operation O2 is transformed relative to operation O1, which user 2 did not have at the time of creation of O2.
    • The result is del [3, “c”], since one position has already been inserted before position 2, and at “c” the position has become 3.

    Operation O1:
    • The user 2 from the user 1 receives the operation O1 ins [0, “x”].
    • O1 is transformed relative to O2, but this time during the transformation the operation does not change and is applied in the same form as the author of the operation, user 1.

    The inclusion transformation is used here (in some algorithms, the exception transformation is also used). In what follows, I will use the notation Inc (O1, O2) for operation O1, taking into account the effects of operation O2, that is, in short, we included O2 in O1.

    The basic requirement that inclusion transformations (known as Transition Property 1) must satisfy is:

    Inc (O2, O1) * O1 = Inc (O1, O2) * O2,

    where * is a sequential application, from right to left. For ease of understanding, look at the image:


    The above equality as applied to this picture means that, starting from one state of the document and moving along the right or left branch, we will get the same final state. The fundamental point is that we must begin the "movement" from the same state. If this condition is not met, then the result will be a mismatched state of the documents or a transformed operation may just not make sense. For example, inserting a character into an invalid or non-existent document position.

    As I indicated above, there are several different OT-based algorithms. Their main difference is the solution to the key question: how to make transformations in such a way that any operations involved in the transformation are on the same state of the document, and if this is not possible, then with what other transformations you can change the order in which operations are applied.

    The fact is that the first OT-based algorithms did not provide for the use of a central server. All customers contacted peer-to-peer. Accordingly, the current state of the document among users was expressed by a sequence of operations (O1, O2 ... ON), and at each point in time, each client can have a quantity, composition and order of these operations. In this case, it is impossible to determine a single strict order among all generated operations, you can enter only a weak order according to happens before the criterion. Operations between which there is no such relationship are considered to be competing or concurrent.

    A similar approach also brings with it certain difficulties:

    • Performance. The number of transformations can be very large, customers have to keep the whole history, since at any moment an “ancient” operation from another user can come in arbitrarily.
    • For the correct implementation of the requirement, TP-1 is no longer enough. You also have to require at least TP-2: Inc (O3, Inc (O1, O2) * O2) = Inc (O3, Inc (O2, O1) * O1). Depending on the specific algorithm, it may be necessary to fulfill other requirements for transformations and inverse operations (the full list is here ). If you have a set of operations, then these requirements must be met for any pair or triple, and this is far from always the case. Moreover, in order to have confidence in the convergence of the algorithm, these requirements must be formally verified, that is, proved as a theorem.
    • Complexity. Such algorithms are really difficult, and sometimes they find errors. For example, one of the cases that potentially leads to an error looks like this:



    Those interested in classical peer-to-peer algorithms can find a detailed overview and comparison here .

    The difficulties listed above can be resolved by abandoning the equal rights of all clients and peer-to-peer connections. When adding a central server, the status of documents can be described by a simple revision and require no more than TP-1 execution ... But the next article will be devoted to this topic. I promise that for lovers of algorithms there will be more interesting things.

    The first article from the cycle, meanwhile, came to an end. I will be glad to see in the comments your questions and suggestions regarding the content of the following articles.

    The second part can be found here .

    Till!

    Also popular now: