erogov March 21, 2019 at 01:53

MVCC-1. Insulation

Hello, Habr! With this article, I begin a series of loops (or a loop of series? In general, a grand idea) about the internal structure of PostgreSQL.

The material will be based on administration training courses that we do with Pavel pluzanov . Not everyone likes to watch a video (I definitely don’t like it), but to read slides, even with comments, is completely “wrong”.

Of course, the articles will not repeat the content of the courses one to one. I will only talk about how everything works, omitting the administration itself, but I will try to do it in more detail and in detail. And I believe that such knowledge is useful to the application developer no less than the administrator.

I will focus on those who already have some experience using PostgreSQL and at least in general terms imagine what’s happening. For beginners, the text will be a little heavy. For example, I will not say a word about how to install PostgreSQL and run psql.

Things that will be discussed do not change much from version to version, but I will use the current, 11th “vanilla” PostgreSQL.

The first cycle is devoted to issues related to isolation and multiversion, and its plan is as follows:

Isolation, as understood by the standard and PostgreSQL (this article);
Layers, files, pages - what is happening at the physical level;
Row versions, virtual and nested transactions ;
Data snapshots and visibility of row versions, event horizon ;
In-page cleaning and HOT-updates ;
Normal cleaning (vacuum);
Automatic cleaning (autovacuum);
Transaction counter overflow and freeze .

Let's go.

What is insulation and why is it important?

Наверное, все как минимум знают про существование транзакций, встречали аббревиатуру ACID и слышали про уровни изоляции. Но приходится еще встречать мнение, что это-де теория, на практике не нужная. Поэтому я потрачу некоторое время на попытку объяснить, почему это действительно важно.

Вряд ли вы обрадуетесь, если приложение получит из базы некорректные данные, или если приложение запишет некорректные данные в базу.

Но что такое “корректные” данные? Известно, что на уровне базы данных можно создавать ограничения целостности (integrity constraints, такие как NOT NULL или UNIQUE). Если данные всегда удовлетворяют ограничениям целостности (а это так, потому что СУБД это гарантирует), то они целостны.

Корректны и целостны- same? Not really. Not all restrictions can be formulated at the database level. Part of the restrictions is too complicated, for example, it covers several tables at once. And even if the restriction, in principle, could be defined in the database, but for some reason they did not, it does not mean that it can be violated.

So, correctness is stricter than integrity , but we don’t know exactly what it is. It remains to be recognized that the standard of correctness is an application that, as we want to believe, is written correctly and is never mistaken. In any case, if the application does not violate the integrity, but violates the correctness, the DBMS will not know about it and will not catch his hand.

Further we will call correctness the termcoherence (consistency).

Let's assume, however, that the application only executes the correct sequence of statements. What is the role of the DBMS then, if the application is correct?

Firstly, it turns out that a correct sequence of statements can temporarily disrupt data consistency, and this - oddly enough - is normal. A hackneyed but understandable example is to transfer funds from one account to another. The consistency rule may sound like this: a transfer never changes the total amount of money in accounts(Such a rule is rather difficult to write to SQL as an integrity constraint, so it exists at the application level and is invisible to the DBMS). A transfer consists of two operations: the first reduces funds in one account, the second - increases in another. The first operation violates the consistency of data, the second - restores.

A good exercise is to implement the rule described above at the level of integrity constraints. Are you weak? ©

What if the first operation is completed and the second is not? After all, it’s easy: during the second operation, electricity can be lost, the server can fall, division by zero can occur - but you never know. It is clear that consistency is being violated, and this should not be allowed. In principle, it is possible to resolve such situations at the application level at the cost of incredible efforts, but, fortunately, it is not necessary: the DBMS takes care of this. But for this, she must know that two operations constitute an indivisible whole. That is a transaction .

It turns out interesting: knowing that operations constitute a transaction, the DBMS helps maintain consistency by guaranteeing the atomicity of transactions, while not knowing anything about specific consistency rules.

But there is a second, more subtle point. As soon as several simultaneous transactions appear in the system that are absolutely correct one by one, together they can work incorrectly. This is due to the fact that the order of operations is mixed: it cannot be assumed that all operations of one transaction are performed first, and only then all operations of another.

A note about simultaneity. Indeed, at the same time, transactions can work on a system with a multi-core processor, with a disk array, etc. But all the same considerations are true for a server that executes commands sequentially, in time-sharing mode: so many cycles, one transaction is executed, so many cycles are different . Sometimes the term competitive execution is used to summarize .

Situations when correct transactions do not work together correctly are called simultaneous execution anomalies .

A simple example: if an application wants to get the correct data from the database, then at least it should not see changes in other uncommitted transactions. Otherwise, you can not only get inconsistent data, but also see something that has never been in the database (if the transaction is canceled). This anomaly is called dirty reading .

If there are other, more complex anomalies, which we will deal with a little later.

Of course, it is impossible to refuse simultaneous execution: otherwise, what kind of performance can be discussed? But you cannot work with incorrect data.

And again the DBMS comes to the rescue. You can make transactions run as if sequentially, as if one after the other. In other words, in isolation from each other. In reality, the DBMS can perform operations mixed up, but at the same time guarantee that the result of simultaneous execution will coincide with the result of any of the possible sequential executions. And this eliminates any possible anomalies.

So, we come to the definition:

A transaction is the set of operations performed by an application that transfers the database from one correct state to another correct state (consistency), provided that the transaction is complete (atomicity) and without interference from other transactions (isolation).

This definition combines the first three letters of the acronym ACID. They are so closely related to each other that it simply makes no sense to consider one without the other. In fact, it is difficult to tear off the letter D (durability). After all, in the event of a system crash, changes to uncommitted transactions remain in it, with which you have to do something to restore data consistency.

Everything would be fine, but the implementation of complete isolation is a technically difficult task, coupled with a decrease in system throughput. Therefore, in practice, very often (not always, but almost always) weakened insulation is applied, which prevents some, but not all, anomalies. And this means that part of the work to ensure the correctness of the data falls on the application. That is why it is very important to understand what level of isolation is used in the system, what guarantees it gives and which ones it does not, and how to write the correct code under such conditions.

SQL isolation levels and anomalies

The SQL standard has long described four isolation levels. These levels are determined by listing the anomalies that are allowed or not allowed while performing transactions at that level. Therefore, to talk about these levels, you need to get acquainted with the anomalies.

I emphasize that in this part we are talking about the standard, that is, about a certain theory on which practice relies heavily, but which at the same time is at odds with. Therefore, all the examples here are speculative. They will use the same operations on customer accounts: this is pretty obvious, although, admittedly, it has nothing to do with how banking operations are actually arranged.

Lost update

Let's start with the lost update . This anomaly occurs when two transactions read the same row in the table, then one transaction updates this row, and after that the second transaction also updates the same row, not taking into account the changes made by the first transaction.

For example, two transactions are going to increase the amount on the same account by 100 ₽. The first transaction reads the current value (1000 ₽), then the second transaction reads the same value. The first transaction increases the amount (it turns out 1100 ₽) and writes this value. The second transaction does the same - gets the same 1,100 ₽ and writes them. As a result, the client lost 100 ₽.

Lost updates are not allowed by the standard at any isolation level.

Dirty Reading and Read Uncommitted

With dirty reading we have already met above. This anomaly occurs when a transaction reads pending changes made by another transaction.

For example, the first transaction transfers all the money from the client’s account to another account, but does not record the change. Another transaction reads the account status, receives 0 ₽ and refuses to issue cash to the client - despite the fact that the first transaction is interrupted and cancels its changes, so the value 0 never existed in the database.

Dirty reading is allowed by the standard at the Read Uncommitted level.

Non-Repeat Read and Read Committed

Аномалия неповторяющегося чтения возникает, когда транзакция читает одну и ту же строку два раза, и в промежутке между чтениями вторая транзакция изменяет (или удаляет) эту строку и фиксирует изменения. Тогда первая транзакция получит разные результаты.

Например, пусть правило согласованности запрещает отрицательные суммы на счетах клиентов. Первая транзакция собирается уменьшить сумму на счете на 100 ₽. Она проверяет текущее значение, получает 1000 ₽ и решает, что уменьшение возможно. В это время вторая транзакция уменьшает сумму на счете до нуля и фиксирует изменения. Если бы теперь первая транзакция повторно проверила сумму, она получила бы 0 ₽ (но она уже приняла решение уменьшить значение, и счет “уходит в минус”).

Non-repeating reading is allowed by the standard at the Read Uncommitted and Read Committed levels. But dirty reading Read Committed does not allow.

Phantom Read and Repeatable Read

Phantom reading occurs when a transaction reads a set of lines twice under the same condition, and in the interval between reads, the second transaction adds lines that satisfy this condition (and commits the changes). Then the first transaction will receive different sets of rows.

For example, suppose a consistency rule prohibits a customer from having more than 3 accounts . The first transaction is going to open a new account, checks their current number (say, 2) and decides that opening is possible. At this time, the second transaction also opens a new account for the client and records the changes. If now the first transaction double-checked the quantity, it would receive 3 (but it is already opening another account and the client has 4 of them).

Phantom reading is allowed by the standard at the Read Uncommitted, Read Committed and Repeatable Read levels. But at the Repeatable Read level, non-repeating reading is not allowed.

Lack of Anomalies and Serializable

The standard defines another level - Serializable - at which no anomalies are allowed. And this is not at all the same as a ban on a lost update and a dirty, non-repeating and phantom read.

The fact is that there are significantly more known anomalies than those listed in the standard, and an unknown number is still unknown.

Serializable should prevent all abnormalities in general . This means that at this level, the application developer does not need to think about running simultaneously. If transactions perform the correct sequences of statements, working alone, the data will be consistent with the simultaneous operation of these transactions.

Summary plate

Now you can bring a well-known table to everyone. But here, for clarity, the last column is added to it, which is not in the standard.

	lost changes	dirty reading	non-repeat reading	phantom reading	other anomalies
Read Uncommitted	-	Yes	Yes	Yes	Yes
Read committed	-	-	Yes	Yes	Yes
Repeatable Read	-	-	-	Yes	Yes
Serializable	-	-	-	-	-

Why exactly these anomalies?

Why are only a few of the many possible anomalies in the standard listed, and why are these?

Apparently, no one seems to know this for certain. But the practice here definitely overtook the theory, so it is possible that then we did not think about other anomalies (speech about the SQL standard: 92).

In addition, it was assumed that the insulation should be built on interlocks. The idea of a widely used biphasic blocking protocol(2PL) consists in the fact that during the transaction the transaction locks the lines it is working with, and when completed, it releases the locks. Simplifying greatly, the more locks a transaction captures, the better it is isolated from other transactions. But the system’s performance suffers even more, because instead of working together, transactions begin to line up for the same lines.

It seems to me that the difference between the isolation levels of the standard is explained precisely by the number of necessary locks.

If a transaction blocks modified rows from changing, but not from reading, we get the Read Uncommitted level: lost changes are not allowed, but uncommitted data can be read.

If the transaction blocks mutable lines from reading and from changing, we get the Read Committed level: you cannot read uncommitted data, but when you access the line again, you can get a different value (non-repeating reading).

If a transaction blocks both readable and mutable lines from both reading and changing, we get the Repeatable Read level: repeated reading of the line will produce the same value.

But there is a problem with Serializable: it is not possible to lock a row that does not exist yet. Because of this, the possibility of phantom reading remains: another transaction may add (but not delete) a row that falls under the conditions of a previously executed query, and this row will be re-fetched.

Therefore, to implement the level of Serializable, ordinary locks are not enough - you need to block not the rows, but conditions (predicates). Such locks were called predicate . They were proposed back in 1976, but their practical applicability is limited by rather simple conditions, for which it is clear how to combine two different predicates. As far as I know, it has not come to the implementation of such locks in any system.

PostgreSQL isolation levels

Over time, Snapshot Isolation replaced the blocking transaction management protocols . His idea is that each transaction works with a consistent snapshot of the data at a certain point in time, in which only those changes that were recorded before the creation of the snapshot fall.

Such isolation does not automatically allow dirty reading. Formally, in PostgreSQL, you can specify the Read Uncommitted level, but it will work just like Read Committed. Therefore, we will not talk about the Read Uncommitted level further.

PostgreSQL implements multi-versiona variant of such a protocol. The idea of multi-versioning is that several versions of the same string can coexist in a DBMS. This allows you to build a snapshot of the data using the available versions, and get by with a minimum of locks. In fact, only repeated changes to the same line are blocked. All other operations are performed at the same time: writing transactions never block reading transactions, and reading transactions never block anyone.

By using data snapshots, isolation in PostgreSQL is stricter than what the standard requires: the Repeatable Read level does not allow not only non-repeatable, but also phantom reads (although it does not provide complete isolation). And this is achieved without loss of effectiveness.

	lost changes	dirty reading	non-repeat reading	phantom reading	other anomalies
Read Uncommitted	-	-	Yes	Yes	Yes
Read committed	-	-	Yes	Yes	Yes
Repeatable Read	-	-	-	-	Yes
Serializable	-	-	-	-	-

How multi-versioning is implemented “under the hood”, we will talk in the following articles, and now look in detail at each of the three levels through the eyes of the user (as you know, the most interesting is hidden behind “other anomalies”). To do this, create a table of accounts. Alice and Bob have $ 1,000 each, but Bob has two accounts open:

=> CREATE TABLE accounts(
  id integer PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  number text UNIQUE,
  client text,
  amount numeric
);
=> INSERT INTO accounts VALUES
  (1, '1001', 'alice', 1000.00),
  (2, '2001', 'bob', 100.00),
  (3, '2002', 'bob', 900.00);

Read committed

Lack of dirty reading

It is easy to verify that dirty data cannot be read. Let's start the transaction. By default, it will use the Read Committed isolation level:

=> BEGIN;
=> SHOW transaction_isolation;

 transaction_isolation 
-----------------------
 read committed
(1 row)

More precisely, the default level is set by the parameter, it can be changed if necessary:

=> SHOW default_transaction_isolation;

 default_transaction_isolation 
-------------------------------
 read committed
(1 row)

So, in an open transaction, we withdraw funds from the account, but do not record the changes. The transaction sees its own changes:

=> UPDATE accounts SET amount = amount - 200 WHERE id = 1;
=> SELECT * FROM accounts WHERE client = 'alice';

 id | number | client | amount 
----+--------+--------+--------
  1 | 1001   | alice  | 800.00
(1 row)

In the second session, we start another transaction with the same Read Committed level. To distinguish between different transactions, the commands of the second transaction will be indented and crossed out.

In order to repeat the above commands (which is useful), you need to open two terminals and run psql in each. In the first, you can enter the commands of one transaction, and in the second - the commands of another.

|  => BEGIN;
|  => SELECT * FROM accounts WHERE client = 'alice';

|   id | number | client | amount  
|  ----+--------+--------+---------
|    1 | 1001   | alice  | 1000.00
|  (1 row)

As expected, another transaction does not see uncommitted changes - dirty reading is not allowed.

Non-repeat reading

Now let the first transaction commit the changes, and the second re-execute the same request.

=> COMMIT;

|  => SELECT * FROM accounts WHERE client = 'alice';

|   id | number | client | amount 
|  ----+--------+--------+--------
|    1 | 1001   | alice  | 800.00
|  (1 row)

|  => COMMIT;

The request is already receiving new data - this is the anomaly of non-repeating reading , which is allowed at the Read Committed level.

Practical conclusion : in a transaction it is impossible to make decisions based on the data read by the previous statement - because everything can change between the times the statements are executed. Here is an example whose variations are so common in application code that it is a classic antipattern:

      IF (SELECT amount FROM accounts WHERE id = 1) >= 1000 THEN
        UPDATE accounts SET amount = amount - 1000 WHERE id = 1;
      END IF;

During the time that elapses between verification and updating, other transactions can change the state of the account as desired, so that such a “check” does not save anything. It’s convenient to imagine that between the operators of one transaction any other operators of other transactions can “wedge”, for example, like this:

      IF (SELECT amount FROM accounts WHERE id = 1) >= 1000 THEN
       -----
      |   UPDATE accounts SET amount = amount - 200 WHERE id = 1;
      |   COMMIT;
       -----
        UPDATE accounts SET amount = amount - 1000 WHERE id = 1;
      END IF;

If, rearranging the operators, you can ruin everything, then the code is written incorrectly. And do not deceive yourself that such a combination of circumstances will not happen - it will happen.

How to write the code correctly? Opportunities, as a rule, boil down to the following:

Do not write code.
It's not a joke. For example, in this case, the check easily turns into a integrity constraint:
ALTER TABLE accounts ADD CHECK amount >= 0;
Now no checks are needed: it is enough to simply perform the action and, if necessary, handle the exception that will occur if an integrity violation is attempted.
Use a single SQL statement.
Consistency problems arise due to the fact that in the interval between operators another transaction may end and the visible data will change. And if there is only one operator, then there are no gaps.
PostgreSQL has enough tools to solve complex problems with a single SQL statement. We note the general table expressions (CTE), in which, among other things, you can use the INSERT / UPDATE / DELETE statements, as well as the INSERT ON CONFLICT statement, which implements the "insert, and if there is already a row, update" logic in one statement.
User locks.
The last resort is to manually set an exclusive lock either on all the necessary rows (SELECT FOR UPDATE), or on the whole table (LOCK TABLE). This always works, but negates the benefits of multi-versioning: instead of simultaneously executing, part of the operations will be performed sequentially.

Inconsistent reading

Before embarking on the next level of isolation, one has to admit that not everything is so simple. The implementation of PostgreSQL is such that it allows for other, less well-known anomalies that are not regulated by the standard.

Let's say the first transaction started transferring funds from one Bob account to another:

=> BEGIN;
=> UPDATE accounts SET amount = amount - 100 WHERE id = 2;

At this time, another transaction calculates Bob's balance, with the calculation being performed in a cycle on all Bob's accounts. In fact, the transaction starts from the first account (and, obviously, sees the previous state):

|  => BEGIN;
|  => SELECT amount FROM accounts WHERE id = 2;

|   amount 
|  --------
|   100.00
|  (1 row)

At this point, the first transaction completes successfully:

=> UPDATE accounts SET amount = amount + 100 WHERE id = 3;
=> COMMIT;

And the other reads the state of the second account (and sees already a new value):

|  => SELECT amount FROM accounts WHERE id = 3;

|   amount  
|  ---------
|   1000.00
|  (1 row)

|  => COMMIT;

Thus, the second transaction received a total of 1100 ₽, that is, incorrect data. This is an anomaly of inconsistent reading .

How to avoid such an anomaly by staying at Read Committed? Of course, use one operator. For example, like this:

      SELECT sum(amount) FROM accounts WHERE client = 'bob';

So far, I have argued that data visibility can only change between operators, but is it so obvious? And if the request is executed for a long time, can it see part of the data in one state, and part in another?

Check it out. A convenient way to do this is to insert an artificial delay into the operator by calling the pg_sleep function. Its parameter sets the delay time in seconds.

=> SELECT amount, pg_sleep(2) FROM accounts WHERE client = 'bob';

While this construction is in progress, in another transaction, we transfer funds back:

|  => BEGIN;
|  => UPDATE accounts SET amount = amount + 100 WHERE id = 2;
|  => UPDATE accounts SET amount = amount - 100 WHERE id = 3;
|  => COMMIT;

The result shows that the operator sees the data in the state in which it was at the time it started. This is certainly correct.

 amount  | pg_sleep 
---------+----------
    0.00 | 
 1000.00 | 
(2 rows)

But here it is not so simple. PostgreSQL allows you to define functions, while functions have the concept of a category of variability . If a volatile function (with the VOLATILE category) is called in a request , and another request is executed in this function, then this request inside the function will see data that is not consistent with the data of the main request.

=> CREATE FUNCTION get_amount(id integer) RETURNS numeric AS $$
  SELECT amount FROM accounts a WHERE a.id = get_amount.id;
$$ VOLATILE LANGUAGE sql;

=> SELECT get_amount(id), pg_sleep(2)
FROM accounts WHERE client = 'bob';

|  => BEGIN;
|  => UPDATE accounts SET amount = amount + 100 WHERE id = 2;
|  => UPDATE accounts SET amount = amount - 100 WHERE id = 3;
|  => COMMIT;

In this case, we get incorrect data - 100 ₽ are lost:

 get_amount | pg_sleep 
------------+----------
     100.00 | 
     800.00 | 
(2 rows)

I emphasize that such an effect is possible only at the Read Committed isolation level, and only with the VOLATILE variability category. The trouble is that this level of isolation and this category of variability are used by default, so I must admit - the rake lies very well. Do not step!

Inconsistent reading in exchange for lost changes

An inconsistent reading within the framework of one operator can - in a somewhat unexpected way - be obtained during an update.

Let's see what happens when you try to change the same row with two transactions. Bob now has 1000 ₽ on two accounts:

=> SELECT * FROM accounts WHERE client = 'bob';

 id | number | client | amount 
----+--------+--------+--------
  2 | 2001   | bob    | 200.00
  3 | 2002   | bob    | 800.00
(2 rows)

We start a transaction that reduces Bob's balance:

=> BEGIN;
=> UPDATE accounts SET amount = amount - 100 WHERE id = 3;

At the same time, another transaction accrues interest on all customer accounts with a total balance equal to or greater than 1000 ₽:

|  => UPDATE accounts SET amount = amount * 1.01
|  WHERE client IN (
|    SELECT client
|    FROM accounts
|    GROUP BY client
|    HAVING sum(amount) >= 1000
|  );

Executing an UPDATE statement consists of two parts. First, a SELECT is actually executed, which selects the rows matching the condition for updating. Since the change of the first transaction is not fixed, the second transaction cannot see it and it does not affect the choice of lines for calculating interest. So, Bob’s accounts fall under the condition and after the update is completed, his balance should increase by 10 ₽.

The second stage of execution - the selected rows are updated one after another. Here the second transaction is forced to “freeze”, because the line id = 3 is already locked by the first transaction.

Meanwhile, the first transaction commits the changes:

=> COMMIT;

What will be the result?

=> SELECT * FROM accounts WHERE client = 'bob';

 id | number | client |  amount  
----+--------+--------+----------
  2 | 2001   | bob    | 202.0000
  3 | 2002   | bob    | 707.0000
(2 rows)

Yes, on the one hand, the UPDATE command should not see changes in the second transaction. But on the other hand, it should not lose the changes recorded in the second transaction.

After the lock is released, UPDATE re-reads the line that it is trying to update (but only one!). The result is that Bob accrued 9 ₽, based on the amount of 900 ₽. But if Bob had 900 ₽, his accounts should not have been included in the sample at all.

So, the transaction receives incorrect data: some of the rows are visible at one point in time, some at the other. Instead of a lost update, we again get an anomaly in inconsistent reading .

Attentive readers note that with some help from the application at the Read Committed level, you can get a lost update. For example, like this:
      x := (SELECT amount FROM accounts WHERE id = 1);
      UPDATE accounts SET amount = x + 100 WHERE id = 1;
The database is not to blame: it receives two SQL statements and does not know anything that the value of x + 100 is somehow related to accounts.amount. Do not write code this way.

Repeatable Read

Lack of non-repeating and phantom readings

The name of the isolation level itself indicates that the reading is repeatable. We will verify this, and at the same time we will be convinced of the absence of phantom readings. To do this, in the first transaction, return Bob's accounts to their previous state and create a new account for Charlie:

=> BEGIN;
=> UPDATE accounts SET amount = 200.00 WHERE id = 2;
=> UPDATE accounts SET amount = 800.00 WHERE id = 3;
=> INSERT INTO accounts VALUES
  (4, '3001', 'charlie', 100.00);
=> SELECT * FROM accounts ORDER BY id;

 id | number | client  | amount 
----+--------+---------+--------
  1 | 1001   | alice   | 800.00
  2 | 2001   | bob     | 200.00
  3 | 2002   | bob     | 800.00
  4 | 3001   | charlie | 100.00
(4 rows)

In the second session, we start the transaction with the Repeatable Read level, indicating it in the BEGIN command (the level of the first transaction is not important).

|  => BEGIN ISOLATION LEVEL REPEATABLE READ;
|  => SELECT * FROM accounts ORDER BY id;

|   id | number | client |  amount  
|  ----+--------+--------+----------
|    1 | 1001   | alice  |   800.00
|    2 | 2001   | bob    | 202.0000
|    3 | 2002   | bob    | 707.0000
|  (3 rows)

Now the first transaction commits the changes, and the second re-executes the same request.

=> COMMIT;

|  => SELECT * FROM accounts ORDER BY id;

|   id | number | client |  amount  
|  ----+--------+--------+----------
|    1 | 1001   | alice  |   800.00
|    2 | 2001   | bob    | 202.0000
|    3 | 2002   | bob    | 707.0000
|  (3 rows)

|  => COMMIT;

The second transaction continues to see exactly the same data as at the beginning: neither changes to existing lines nor new lines are visible.

At this level, you don’t have to worry about something changing between the two operators.

Serialization error in exchange for lost changes

We said above that when updating the same row with two transactions at the Read Committed level, an anomaly of inconsistent reading may occur. This is due to the fact that the pending transaction rereads the locked row and thus sees it not at the same point in time as the rest of the rows.

At the Repeatable Read level, such an anomaly is not allowed, but if it still arises, nothing can be done - therefore, the transaction ends with a serialization error. We verify by repeating the same scenario with percentages:

=> SELECT * FROM accounts WHERE client = 'bob';

 id | number | client | amount 
----+--------+--------+--------
  2 | 2001   | bob    | 200.00
  3 | 2002   | bob    | 800.00
(2 rows)

=> BEGIN;
=> UPDATE accounts SET amount = amount - 100.00 WHERE id = 3;

|  => BEGIN ISOLATION LEVEL REPEATABLE READ;
|  => UPDATE accounts SET amount = amount * 1.01
|  WHERE client IN (
|    SELECT client
|    FROM accounts
|    GROUP BY client
|    HAVING sum(amount) >= 1000
|  );

=> COMMIT;

|  ERROR:  could not serialize access due to concurrent update

|  => ROLLBACK;

Data remained consistent:

=> SELECT * FROM accounts WHERE client = 'bob';

 id | number | client | amount 
----+--------+--------+--------
  2 | 2001   | bob    | 200.00
  3 | 2002   | bob    | 700.00
(2 rows)

The same error will occur in the case of any other competitive row change, even if the columns of interest to us have not actually changed.

Practical conclusion : if the application uses the Repeatable Read isolation level for writing transactions, it should be ready to repeat transactions that ended in a serialization error. For read-only transactions, such an outcome is not possible.

Inconsistent entry

So, in PostgreSQL, at the isolation level of Repeatable Read, all the anomalies described in the standard are prevented. But not all at all. It turns out that there are exactly two anomalies that remain possible. (This is true not only for PostgreSQL, but also for other snapshot-based isolation implementations.)

The first of these anomalies is inconsistent recording .

Let this consistency rule apply: negative amounts are allowed on the client’s accounts if the total amount on all accounts of this client remains non-negative .

The first transaction receives the amount in Bob's accounts: 900 ₽.

=> BEGIN ISOLATION LEVEL REPEATABLE READ;
=> SELECT sum(amount) FROM accounts WHERE client = 'bob';

  sum   
--------
 900.00
(1 row)

The second transaction receives the same amount.

|  => BEGIN ISOLATION LEVEL REPEATABLE READ;
|  => SELECT sum(amount) FROM accounts WHERE client = 'bob';

|    sum   
|  --------
|   900.00
|  (1 row)

The first transaction rightly believes that the amount of one of the accounts can be reduced by 600 ₽.

=> UPDATE accounts SET amount = amount - 600.00 WHERE id = 2;

And the second transaction comes to the same conclusion. But reduces another score:

|  => UPDATE accounts SET amount = amount - 600.00 WHERE id = 3;
|  => COMMIT;

=> COMMIT;
=> SELECT * FROM accounts WHERE client = 'bob';

 id | number | client | amount  
----+--------+--------+---------
  2 | 2001   | bob    | -400.00
  3 | 2002   | bob    |  100.00
(2 rows)

We managed to get Bob's balance to minus, although each of the transactions works correctly individually.

Read-Only Anomaly

This is the second and last of the anomalies possible at the Repeatable Read level. To demonstrate it, you will need three transactions, two of which will modify the data, and the third - only read.

But first, restore Bob’s account status:

=> UPDATE accounts SET amount = 900.00 WHERE id = 2;
=> SELECT * FROM accounts WHERE client = 'bob';

 id | number | client | amount
----+--------+--------+--------
  3 | 2002   | bob    | 100.00
  2 | 2001   | bob    | 900.00
(2 rows)

The first transaction charges Bob interest on the amount of funds in all accounts. Interest is credited to one of his accounts:

=> BEGIN ISOLATION LEVEL REPEATABLE READ; -- 1
=> UPDATE accounts SET amount = amount + (
  SELECT sum(amount) FROM accounts WHERE client = 'bob'
) * 0.01
WHERE id = 2;

Then another transaction withdraws money from another account of Bob and captures his changes:

|  => BEGIN ISOLATION LEVEL REPEATABLE READ; -- 2
|  => UPDATE accounts SET amount = amount - 100.00 WHERE id = 3;
|  => COMMIT;

If at this moment the first transaction is committed, there will be no anomaly: we could assume that the first transaction was completed first, and then the second (but not vice versa, because the first transaction saw the state of the account id = 3 before this account was changed by the second transaction).

But suppose that at this moment the third (read only) transaction begins, which reads the status of some account that is not affected by the first two transactions:

|  => BEGIN ISOLATION LEVEL REPEATABLE READ; -- 3
|  => SELECT * FROM accounts WHERE client = 'alice';

|   id | number | client | amount 
|  ----+--------+--------+--------
|    1 | 1001   | alice  | 800.00
|  (1 row)

And only after that the first transaction is completed:

=> COMMIT;

What state should the third transaction now see?

|    SELECT * FROM accounts WHERE client = ‘bob’;

Having started, the third transaction could see changes in the second transaction (which was already committed), but not the first (which has not yet been committed). On the other hand, we have already established above that the second transaction should be considered to have begun after the first. Whatever state the third transaction sees, it will be inconsistent - this is the anomaly of only the reading transaction. But at the Repeatable Read level, it is allowed:

|    id | number | client | amount
|   ----+--------+--------+--------
|     2 | 2001   | bob    | 900.00
|     3 | 2002   | bob    |   0.00
|   (2 rows)

|   => COMMIT;

Serializable

At the Serializable level, all possible anomalies are prevented. In fact, Serializable is implemented as an add-in on isolation based on data snapshots. Those anomalies that do not occur during Repeatable Read (such as dirty, non-repeatable, phantom read) do not occur at the level of Serializable. And those anomalies that arise (inconsistent recording and anomaly of only the reading transaction) are detected and the transaction is aborted - a familiar serialization error could not serialize access occurs.

Inconsistent entry

To illustrate, we repeat the scenario with an anomaly of inconsistent recording:

=> BEGIN ISOLATION LEVEL SERIALIZABLE;
=> SELECT sum(amount) FROM accounts WHERE client = 'bob';

   sum    
----------
 910.0000
(1 row)

|   => BEGIN ISOLATION LEVEL SERIALIZABLE;
|   => SELECT sum(amount) FROM accounts WHERE client = 'bob';

|      sum    
|   ----------
|    910.0000
|   (1 row)

=> UPDATE accounts SET amount = amount - 600.00 WHERE id = 2;

|   => UPDATE accounts SET amount = amount - 600.00 WHERE id = 3;
|   => COMMIT;

=> COMMIT;

ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during commit attempt.
HINT:  The transaction might succeed if retried.

Just like at the Repeatable Read level, an application using the Serializable isolation level should repeat transactions that ended with a serialization error, which is also reported to us in the error message.

We get the simplicity of programming, but the price for it is the forced breakdown of a certain proportion of transactions and the need to repeat them. The whole question, of course, is how big this share is. If only those transactions were terminated that really incompatibly intersect in data with other transactions, everything would be nice. But such an implementation would inevitably turn out to be resource-intensive and inefficient, since it would have to track operations with each row.

In fact, the implementation of PostgreSQL is such that it allows false negative triggers: some completely normal transactions that simply “have no luck” will break off. As we will see later, this depends on many reasons, for example, the availability of suitable indexes or the available amount of RAM. In addition, there are some other (rather serious) implementation restrictions, for example, requests at the Serializable level will not work on replicas, parallel execution plans will not be used for them. And although the work on improving the implementation does not stop, but the existing restrictions reduce the attractiveness of this level of isolation.

Parallel plans will appear in PostgreSQL 12 ( patch ). And queries on replicas can earn in PostgreSQL 13 ( another patch ).

Read-Only Anomaly

So that only a reading transaction can not lead to an anomaly and cannot suffer from it, PostgreSQL offers an interesting mechanism: such a transaction can be blocked until its execution is safe. This is the only case where a SELECT statement can be blocked by row updates. Here's what it looks like:

=> UPDATE accounts SET amount = 900.00 WHERE id = 2;
=> UPDATE accounts SET amount = 100.00 WHERE id = 3;
=> SELECT * FROM accounts WHERE client = 'bob' ORDER BY id;

 id | number | client | amount 
----+--------+--------+--------
  2 | 2001   | bob    | 900.00
  3 | 2002   | bob    | 100.00
(2 rows)

=> BEGIN ISOLATION LEVEL SERIALIZABLE; -- 1
=> UPDATE accounts SET amount = amount + (
  SELECT sum(amount) FROM accounts WHERE client = 'bob'
) * 0.01
WHERE id = 2;

|  => BEGIN ISOLATION LEVEL SERIALIZABLE; -- 2
|  => UPDATE accounts SET amount = amount - 100.00 WHERE id = 3;
|  => COMMIT;

The third transaction is explicitly declared only by the reader (READ ONLY) and deferred (DEFERRABLE):

|   => BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY DEFERRABLE; -- 3
|   => SELECT * FROM accounts WHERE client = 'alice';

When you try to execute a request, the transaction is blocked, because otherwise its execution will lead to an anomaly.

=> COMMIT;

And only after the first transaction is committed, the third continues to execute:

|    id | number | client | amount
|   ----+--------+--------+--------
|     1 | 1001   | alice  | 800.00
|   (1 row)

|   => SELECT * FROM accounts WHERE client = 'bob';

|    id | number | client |  amount  
|   ----+--------+--------+----------
|     2 | 2001   | bob    | 910.0000
|     3 | 2002   | bob    |     0.00
|   (2 rows)

|   => COMMIT;

Another important note: if Serializable isolation is used, then all transactions in the application must use this level. You cannot mix Read Committed (or Repeatable Read) transactions with Serializable. That is, you can mix something, but then Serializable will behave like a Repeatable Read without warning. Why this happens, we will consider later when we talk about implementation.

So if you decide to use Serializble, it’s best to globally set the default level (although this, of course, does not prohibit specifying the wrong level explicitly):

      ALTER SYSTEM SET default_transaction_isolation = 'serializable';

A more rigorous presentation of issues related to transactions, consistency and anomalies can be found in Boris Asenovich Novikov's book and lecture course, “Fundamentals of Database Technologies”.

What level of insulation should I use?

The Read Committed isolation level is used by default in PostgreSQL, and it seems that this level is used in the vast majority of applications. It is convenient in that a transaction break is possible on it only in the event of a failure, but not to prevent inconsistency. In other words, a serialization error cannot occur.

Обратной стороной медали является большое число возможных аномалий, которые были подробно рассмотрены выше. Разработчику приходится постоянно иметь их ввиду и писать код так, чтобы не допускать их появления. Если не получается сформулировать нужные действия в одном SQL-операторе, приходится прибегать к явной установке блокировок. Самое неприятное то, что код сложно тестировать на наличие ошибок, связанных с получением несогласованных данных, а сами ошибки могут возникать непредсказуемым и невоспроизводимым образом и поэтому сложны в исправлении.

The isolation level of Repeatable Read removes some of the problems of inconsistency, but, alas, not all. Therefore, you have to not only remember the remaining anomalies, but also modify the application so that it correctly processes serialization errors. This, of course, is inconvenient. But for read-only transactions, this level perfectly complements Read Committed and is very convenient, for example, for building reports that use several SQL queries.

Finally, the Serializable level eliminates the need for inconsistency at all, making code writing much easier. The only thing that is required from the application is to be able to repeat any transactions when it receives a serialization error. But the proportion of interrupted transactions, additional overhead, and the inability to parallelize requests can significantly reduce system throughput. Also note that the Serializable level is not applicable on replicas, and that it cannot be mixed with other isolation levels.

To be continued .

Tags:

MVCC-1. Insulation

What is insulation and why is it important?

SQL isolation levels and anomalies

Lost update

Dirty Reading and Read Uncommitted

Non-Repeat Read and Read Committed

Phantom Read and Repeatable Read

Lack of Anomalies and Serializable

Summary plate

Why exactly these anomalies?

PostgreSQL isolation levels

Read committed

Lack of dirty reading

Non-repeat reading

Inconsistent reading

Inconsistent reading in exchange for lost changes

Repeatable Read

Lack of non-repeating and phantom readings

Serialization error in exchange for lost changes

Inconsistent entry

Read-Only Anomaly

Serializable

Inconsistent entry

Read-Only Anomaly

What level of insulation should I use?

Also popular now: