Transactions in MongoDB

  • Tutorial
imageMongoDB is a great database that has become increasingly popular lately. More and more people with SQL experience are starting to use it, and one of the first questions they have is: MongoDB transactions? .

If you believe the answers with stackoverflow, then everything is bad.

MongoDB doesn't support complex multi-document transactions. If that is something you absolutely need it probably isn't a great fit for you.
If transactions are required, perhaps NoSQL is not for you. Time to go back to ACID relational databases.
MongoDB does a lot of things well, but transactions is not one of those things.
But we will not believe and implement transactions (ACID * ) based on MVCC. Below is a story about how these transactions work, and for those who are eager to see the code, welcome to GitHub (carefully, java).

The post is not about MongoDB, but about how to use compare-and-set to create transactions, and durability is provided exactly to the extent that the storage provides it.

Data model


Unlike many other NoSQL solutions, MongoDB supports compare-and-set. It is CAS support that allows you to add ACID transactions. If you use any other NoSQL storage with CAS support (for example, HBase, Project Voldemort or ZooKeeper), then the described approach can be applied there.
What is CAS?
Это механизм, который гарантирует отказ в изменении объекта, если с момента последнего чтения объект был изменен другим клиентом. Знакомый всем пример - система контроля версий, которая откажет вам в коммите, если ваш коллега успел закомититься раньше.

Actually, all the objects that we want to change in the transaction must be protected by CAS, this affects the data model. Suppose we simulate the work of a bank, below is an account model with and without protection, I hope from this it is clear how to change the rest.
DefenselessDefendants
Model
{
  _id : ObjectId(".."),
  name : "gov",
  balance : 600
}
{
  _id : ObjectId(".."),
  version : 0,
  value : {
    name : "gov",
    balance : 600
  }
}
Data change
db.accounts.update( 
  { _id:ObjectId("...") }, 
  { name:"gov", balance:550 }
);
db.accounts.update({ 
    _id: ObjectId("..."), version: 0
  },{ 
    version : 1, 
    value : { name:"gov", balance:550 } 
});

Further, I will not focus on the fact that the object has a version, and that any change to the object takes into account its version, but this must be remembered and understood that any change to the object may fail with competitive access.

In fact, adding a version is not all the changes that need to be made to the model so that it supports transactions, a completely changed model looks like this:

{
  _id : ObjectId(".."),
  version : 0,
  value : {
    name : "gov",
    balance : 600
  },
  updated : null,
  tx : null
}

Added fields - updated and tx. This is the overhead that is used in the transaction process. The structure of updated is the same as value, meaning - this is a modified version of the object, which will turn into value if the transaction passes; tx is an object of class ObjectId - a foreign key for the _id of an object that represents a transaction. The object representing the transaction is also protected by CAS.

Algorithm


To explain the algorithm is simple, to explain it in such a way that its correctness was obvious, more complicated; therefore it will be necessary that I will operate with some entities before I define them.

The following are true statements, definitions and properties from which the algorithm will be composed later.
  • value always contains a state that was true at some point in the past
  • read operation can modify data in the database
  • read operation is idempotent
  • an object can be in three states: clean - c, dirty uncommitted - d, dirty uncommitted - dc
  • in a transaction, only objects in state are changed: c
  • possible transitions between states: c → d, d → c, d → dc, dc → c
  • Transaction-initiated transitions: c → d, d → dc, dc → c
  • possible transition when reading: d → c
  • if there was a transition d → c, then the transaction inside which there was a transition c → d will fall when committing
  • any operation when working with the database may fall
  • the fallen reading operation needs to be repeated
  • with a fallen record, you need to start a new transaction
  • if the commit fails, you need to check if it passed, if not, repeat the transaction again
  • transaction passed if the object representing the transaction (_id = tx) was deleted


States

A clean state describes an object after a successful transaction: value contains data, and upated and tx are null.

A dirty non-secure state describes the object at the time of the transaction, updated contains the new version, and tx is the _id of the object representing the transaction, this object exists.

A dirty commit state describes an object after a successful transaction, but which fell before it managed to clean up itself, updated contains a new version, tx - _id of the object representing the transaction, but the object itself has already been deleted.

Transaction

  1. We read objects which participate in transaction
  2. Create an object representing the transaction (tx)
  3. We write in updated each object a new value, and in tx - tx._id
  4. Delete the tx object
  5. We write in the value of each object the value from updated, and tx and updated nullify

Reading

  1. Read object
  2. If it is clean - return it
  3. If dirty, we write in value the value from updated, and tx and updated zero
  4. If the dirty one is not - change the version of tx, nullify updated and tx
  5. Go to step # 1


For those who are no longer convinced of correctness, the homework is to verify that all properties and statements are satisfied, and then use them to prove ACID

Conclusion


We have added transactions to MongoDB. But in fact, this is not a panacea and they have limitations, some are listed below, and some in the comments
  • everything works well (the database is consistent, transactions are not lost) under the assumption that if we received confirmation from the repository that the record passed, it really passed and this data will not be lost (monga provides this when logging is enabled)
  • transactions are optimistic, therefore, when changing an object with a high frequency from different flows, it is better not to use them
  • 2n + 2 queries are used to change n objects in one transaction
  • over time, we will accumulate tx objects from dropped transactions - periodically we must delete old
.

FAQ


How can such transactions help with sharding and disabled logging?


In the event of a server error, we can actually get an inconsistent state of the database, but we protect ourselves from the inconsistent state caused by the client crash at the time of recording. If the risk of the second is greater than the first, then using transactions, we all the same increase the reliability of the system.

I use monga in a single-node configuration, will transactions help me?


Yes, if you use journaling, you get honest ACID transactions. If you do not use it, then you already agree to potential data loss, since you are not using the second way to increase reliability - replication. And if you agree, then the transactions in the normal mode maintain consistency with competitive access and client errors, but if the server crashes, there is a chance to lose it. But this is not so scary, since if a single node falls, the system will be unavailable, so you can make the recovery procedure more difficult and restore consistency before restarting the node.

Why not use two-phase transactions from the official documentation ?


Because they do not work with more than one client. They work more precisely, but with wild restrictions:
  • all changes over one element should be commutative (equalization no longer applies)
  • each thread must have an id, these threads must never die and must be able to communicate with each other

Otherwise, consistency and availability are lost (frozen transactions are possible - the only reasonable step is the refusal to read / write objects that participate in these transactions).

Also popular now: