Parsing ACIDs in NoSQL


    It’s not a secret for anyone that if there is a formulated heuristic rule called CAP Theorem, in contrast to the usual RDBMS system, the class of NoSQL solutions cannot provide full support for ACID . I must say that for a number of tasks this is not necessary and the support of one of the elements leads to a compromise in resolving the rest, as a result - a wide variety of existing solutions . In this article, I would like to consider various architectural approaches to solving problems of partially providing requirements for a transaction system.

    "A" Atomicity

    Atomicity ensures that no transaction is partially committed to the system. Either all of her sub-operations will be completed, or not a single one will be completed.

    NoSQL systems usually choose high performance not for the sake of transactional semantics, since adhering to it introduces additional processing costs. Many systems still provide a key or row level guarantee (Google BigTable) or provide api for atomic operations (Amazon DynamoDB), in which only one thread can modify the record if, for example, you want to have a user visit counter distributed across the cluster . Most systems adhere to non-blocking read-modify-write loops. The cycle consists of three stages - to read the value, modify, write. As you can see, in a multi-threaded environment there are many things that can go wrong, for example, that if someone changes the record between the phases of reading and writing. The main mechanism for resolving such conflicts is the use of an algorithmCompare and Swap , - if someone changed the record during the cycle - we must understand that the record changed and repeat the cycle until our value is established, such an algorithm looks more preferable to a completely blocking mechanism for recording. The number of such cycles can be very large, so we need a certain timeout for the operation, after which the operation will be rejected.

    "C" Consistency

    The transaction reaches its normal completion and, thus, fixing its results, maintains the consistency of the database. Considering the specifics of NoSQL to distribute information between servers, this means whether all replicas containing a copy of the data always contain the same version of data.

    Due to the specifics, modern NoSQL must choose high availability and the ability to scale the cluster horizontally, it turns out that the system cannot provide full data consistency and goes to some assumptions in the definition of consistency. There are two approaches:

    Strict consistency

    Such systems ensure that replicas are always able to agree on a single version of the data returned to the user. Some replicas will not contain this value, but when the system processes the request for the value by key, the machine can always decide which value to return - it simply will not always be the last. How it works - for example, we have N replicas of the same key. When a request comes to update the key value, the system will not give the result to the user until W replicas respond that they received the update. When a user requests a value, the system returns a response to the user when at least Rreplicas returned the same value. Then we consider the system to be strictly consistent if the condition R + W> N is met . The choice of the values ​​of R and W affects how many machines must answer before the answer is returned to the user, usually the condition R + W = N + 1 is chosen - the minimum necessary condition for ensuring strict consistency.

    Possible consistency

    Some systems ( Voldemort, Cassandra, Riak ) allow you to choose R and W at which R + W. When the user requests information, there may be times when the system cannot resolve the conflict between the versions of the key values. To resolve conflicts, a versioning type called vector clock is used. This is a vector associated with each key that contains change counters for each replica. Let servers A , B and C be replicas of the same key, the vector will contain three values (N_A, N_B, N_C) , initially initialized to (0,0,0) . Each time a replica changes the value of a key, it increases the value of its counter in a vector. If B changes the value of a key that previously had a version (39, 1, 5) , the vector will change its value to(39, 2, 5) . When another replica, say C , receives an update from replica B, it compares the value of the vector with its own. As long as all your vector counters are less than those that came from B , the value that comes up has a stable version and you can overwrite your own copy. If there are vectors on B and C in which some counters are larger and some are smaller, for example, (39, 2, 5) and (39, 1, 6) , then the system identifies the conflict.

    The resolution of this conflict varies on different systems, Voldemort returns several copies of the value, giving the resolution of the conflict to the user. Two versions of the user basket on the site can be merged without loss of information, while merging two versions of one editable document requires user intervention. Cassandra, which stores the timestamp of each record, returns the latest if a conflict is detected, this approach does not allow merging two versions without loss of information, but it simplifies the client part.

    "I" Isolation

    During a transaction, concurrent transactions should not affect its outcome.

    Cassandra’s concept of transaction isolation levels is also important here , starting with version 1.1 it guarantees that if you do an update: no competitive read will see a partial data update (login has changed, but password hasn’t), and this is true only at the line level, which are within the same column family and have a common key. This may correspond to the read uncommitted transaction isolation level at which lost update conflicts are resolved . But cassandra

    UPDATE Users
    SET login='login' AND password='password'
    WHERE key='key'

    it does not provide a rollback mechanism at the cluster level, for example, a situation is possible when login and password will be stored on a certain number of nodes, but not enough W in order to give the user the correct result, while the user is forced to resolve this conflict himself. The mechanism for ensuring isolation is that for each record that is changed, an invisible, isolated for clients version is created, which subsequently automatically replaces the old version through the Compare and Swap mechanisms described above.

    "D" Reliability

    Regardless of problems at the lower levels (for example, blackout of the system or malfunctions in the equipment), the changes made by a successfully completed transaction must remain saved after the system returns to work. In other words, if the user received confirmation from the system that the transaction was completed, he can be sure that the changes made by him will not be canceled due to any failure.

    The most predicted failure scenario is a power outage or server restart. In this case, a completely reliable system should not return a response to the user until it writes all the changes from memory to the hard disk. Writing to disk is too long and many NoSQL systems compromise for performance.

    Providing Reliability on the Same Server

    A standard disk can withstand 70-150 operations per second, which amounts to a throughput of up to 150 Mb / s, ssd - 700 Mb / s, DDR - 6000 - 17000 Mb / s. Therefore, ensuring reliability within a single server while ensuring high performance is to reduce the number of records with random access and increase sequential recording. Ideally, the system should minimize the number of records between fsync calls (data synchronization in memory and on disk). For this, several techniques are used.

    Controlling the frequency fsync

    Redis offers several ways to configure when to call fsync . You can configure it to be called after every change to the record — which is the slowest and safest choice. To improve performance, you can cause a flush to disk every N seconds, in the worst case, you lose data in the last N seconds, which may be acceptable for some users. If reliability is not critical at all, then you can disable fsync and rely on the fact that the system itself at some point synchronizes memory with the disk.

    Increase sequential recording through logging

    For effective data retrieval, NoSQL systems often use additional structures, for example, B-trees for constructing indexes, working with it causes multiple random accesses to the disk. To reduce this, some systems ( Cassandra, HBase, Riak ) add update operations to a sequentially written file called redo log . While some structures are rarely written to disk, the log is often written. After the fall, the missing entries can be restored using the log.

    Increase bandwidth by grouping records

    Cassandra groups several simultaneous changes over a short window, which can be combined into one fsync . This approach, called group commit , increases the response time for one user, because he is forced to wait for several other transactions to fix his. The advantage here is obtained by increasing the overall throughput, because Multiple random entries can be combined.

    Ensuring Reliability in a Server Cluster

    Due to the possibility of unforeseen failure of disks and servers, it is necessary to distribute information among several machines.
    Redis is a classic master-slave architecture for data replication. All operations associated with the master go down to the replicas in the form of a log.
    MongoDB is a structure in which a given number of servers stores each document, and it is possible to set the number of W serversdescribed above, which is minimally necessary for recording and returning control to the user.
    HBase achieves multiserver reliability through the use of a distributed HDFS file system .

    In general, you can notice some tendency of modern NoSQL-tools towards providing greater data consistency. But still, while SQL and NoSQL-tools can exist and develop in parallel and solve completely different problems.

    Also popular now: