Transactions and mechanisms for their control
Transactions
A transaction is a sequence of operations on data that has a beginning and an end.
A transaction is a sequential read and write operation. The end of a transaction can be either saving changes (commit, commit) or canceling changes (rollback, rollback). In relation to the database, a transaction is a series of queries, which are treated as a single query.
Transactions must satisfy ACID properties
Atomicity. A transaction is either fully executed or not performed at all.
Coherence. At the end of the transaction, the restrictions imposed on the data (for example, constraints in the database) must not be violated. Consistency implies that the system will be transferred from one correct state to another correct one.
Isolation. Concurrently executed transactions should not affect each other, for example, changing data that is used by another transaction. The result of parallel transactions should be as if the transactions were executed sequentially.
Sustainability. Once committed, the changes should not be lost.
Transaction log
The log stores changes made by transactions, ensures atomicity and stability of data in the event of a system failure
The log contains the values that the data had before and after their change by the transaction. Write-ahead log strategy requires you to add to the log a record of previous values before the start, and about the final values after the transaction is completed. In the event of a sudden stop of the system, the database reads the log in the reverse order and discards the changes made by transactions. Having met an interrupted transaction, the database executes it and makes changes about it in the log. Being in the state at the time of the failure, the database reads the log in the direct order and returns the changes made by the transactions. Thus, the stability of transactions that have already been committed and the atomicity of the interrupted transaction are preserved.
Simply re-executing erroneous transactions is not enough to recover.
Example. The user has $ 500 in the account and the user decides to withdraw them through an ATM. Two transactions are performed. The first reads the value of the balance and if there are enough funds on the balance, it gives money to the user. The second subtracts the required amount from the balance. Suppose a system crashes and the first operation fails, and the second one fails. In this case, we cannot re-issue money to the user without returning the system to its original state with a positive balance.
Isolation levels
Read Committed
The problem with a dirty read is that a transaction can read the intermediate result of another transaction.
Example. The initial value of the balance is $ 0. T1 adds $ 50 to the balance. T2 reads the balance value ($ 50). T1 cancels the changes and ends. T2 continues execution with incorrect balance data.
The solution is Read Committed, which prohibits reading data modified by a transaction. If transaction A has changed some data set, then transaction B, when accessing this data, is forced to wait for transaction A.
Repeatable Read
The problem of lost changes (Lost Updates). T1 saves changes over changes to T2.
Example. The initial balance value is $ 0 and two transactions simultaneously replenish the balance. T1 and T2 read a balance of $ 0. Then T2 adds $ 200 to $ 0 and saves the result. T1 adds $ 100 to $ 0 and saves the result. The total result is $ 100 instead of $ 300.
Unrepeatable read problem. Repeated reading of the same data returns different values.
Example. T1 reads a balance value of $ 0. Then T2 adds $ 50 to the balance and ends. T1 reads the data again and detects a discrepancy with the previous result.
Repeatable Read ensures that repeated reading will return the same result. Data read by one transaction is forbidden to be changed in others until the transaction is completed. If transaction A has read some data set, then transaction B, when accessing this data, is forced to wait for transaction A.
Orderly Reading (Serializable)
Phantom Reads Two queries that select data by some condition return different values.
Example. T1 asks for the number of all users whose balance is greater than $ 0 but less than $ 100. T2 subtracts $ 1 from a user with a balance of $ 101. T1 re-executes the request.
Orderly reading (Serializable). Transactions are performed as completely sequential. It is forbidden to update and add records that are subject to the conditions of the request. If transaction A requested data from the entire table, then the whole table is frozen for the rest of the transactions until transaction A.
Scheduler
Sets the order in which operations should be performed in parallel transactions
Provides a specified level of isolation. If the result of operations does not depend on their order, then such operations are commutable (Permutable). Commands are read and operations on different data. Read-write and write-write operations are not commutative. The task of the scheduler is to alternate operations performed by parallel transactions so that the result of execution is equivalent to sequential execution of transactions.
Concurrency Control Mechanisms
Optimistic based on conflict detection and resolution, pessimistic based on conflict prevention
With an optimistic approach, several users have at their disposal copies of the data. The first one who completed the editing saves the changes, the rest must merge the changes. An optimistic algorithm allows a conflict to occur, but the system must recover from the conflict.
With a pessimistic approach, the first user to capture data prevents others from receiving data. If conflicts are rare, it is wise to choose an optimistic strategy, as it provides a higher level of parallelism.
Locking
If one transaction has blocked data, then the rest of the transaction must access the data when accessing the data.
A block can be superimposed on a database, table, row, or attribute. Shared Lock (Shared Lock) can be imposed on the same data by several transactions, allows all transactions (including imposed) reading, prohibits change and exclusive capture. Exclusive Lock can be imposed by only one transaction, allows any actions of the imposed transaction, prohibits any actions by the rest.
A deadlock is a situation when transactions are in standby mode, lasting indefinitely
Example. The first transaction is waiting for the release of data captured by the second, while the second is waiting for the release of data captured by the first.
An optimistic solution to the deadlock problem allows a deadlock to occur, but then restores the system by rolling back one of the transactions involved in the deadlock
With a certain frequency, a search for deadlocks is performed. One of the methods of detection is in time, that is, to consider that a deadlock occurred if the transaction takes too long. When a deadlock is found, one of the transactions is rolled back, which allows other transactions involved in the deadlock to complete. The choice of the victim can be based on the value of transactions or their seniority (Wait-Die and Wound-wait schemes).
Each transaction T is assigned a timestamp TS containing the start time of the transaction.
Wait-Die
If TS (Ti) < TS (Tj) , then Ti waits, otherwise Tirolls back and starts anew with the same timestamp.
If a young transaction has captured a resource and an older one is requesting the same resource, then an older transaction is allowed to be expected. If an older transaction has captured a resource, then a young transaction requesting this resource will be rolled back.
Wound-wait.
If TS (Ti) < TS (Tj) , then Tj rolls back and starts again with the same time stamp, otherwise Ti waits.
If a younger transaction has captured a resource and an older transaction is requesting the same resource, then the young transaction will be rolled back. If an older transaction has captured the resource, then a younger transaction requesting this resource is allowed to wait. The seniority-based selection of the victim prevents deadlocks, but rolls back transactions that are not in a state of interlocking. The problem is that transactions can be rolled back many times because An older transaction can hold a resource for a long time.
A pessimistic solution to the problem of deadlocks does not allow the transaction to start execution if there is a risk of deadlock
To detect deadlocks, a graph is constructed (a wait graph, wait-for-graph), the vertices of which are transactions, and the edges are directed from transactions that are waiting for data to be released to the transaction that captured this data. It is believed that a deadlock occurred if the graph is looped. Building a wait graph, especially in distributed databases, is an expensive procedure.
Two-phase locking - preventing deadlocks by capturing all resources used by the transaction at the beginning of the transaction and releasing them at the end
All blocking operations must precede the first unlocking. It has two phases - Growing Phase at which the capture of grabs occurs and Shrinking Phase at which the release of grabs occurs. If it is impossible to capture one of the resources, the transaction begins again. It is possible that a transaction cannot capture the required resources, for example, if several transactions compete for the same resources.
The two-phase commit ensures that the commit is executed on all database replicas
Each database contributes information about the data that will be changed to the log and corresponds to the coordinator OK (Voting Phase). After everyone answered OK, the coordinator sends a signal obliging everyone to commit. After committing the server, they respond OK, if at least one did not answer OK, then the coordinator sends a signal to cancel the changes to all servers (Completion Phase).
Timestamp method
An older transaction rolls back when trying to access data involved in a younger transaction
Each transaction is assigned a TS timestamp corresponding to the start time of the execution. If Ti is older than Tj , then TS (Ti) < TS (Tj) .
When a transaction rolls back, it is assigned a new timestamp. Each data object Q involved in the transaction is marked with two labels. TS-W (Q) - timestamp of the youngest transaction successfully to record over Q . TS-R (Q) - timestamp of the youngest transaction that has performed reading record over Q .
When transaction T requests reading data Qtwo options are possible.
If TS (T) < W-TS (Q) , that is, the data was updated by a younger transaction, then transaction T rolls back.
If TS (T) > = W-TS (Q) , then reading is performed and R-TS (Q) becomes MAX (R-TS (Q), TS (T)) .
When transaction T requests a change in data Q , two options are possible.
If TS (T) < R-TS (Q) , that is, the data has already been read by a younger transaction and if a change is made, then a conflict will arise. Transaction T rolls back.
IfTS (T) < W-TS (Q) , that is, the transaction is trying to overwrite a newer value, transaction T is rolled back. In other cases, the change is performed and W-TS (Q) becomes equal to TS (T) .
No expensive waiting graph construction is required. Older transactions depend on newer ones, therefore there are no loops in the wait column. There are no deadlocks, because transactions do not expect, but are rolled back immediately. Cascading kickbacks are possible. If Ti rolled back, and Tj read the data that Ti changed, then Tj should also roll back. If in this case Tj has already been communicated, then there will be a violation of the stability principle.
One solution to cascading rollbacks. A transaction performs all write operations at the end, and the remaining transactions must wait for the completion of this operation. Transactions are waiting for a commit before reading.
Thomas write rule - a variation of the timestamp method in which data updated by a younger transaction is not allowed to be overwritten by an older one
Transaction T requests a change in data Q . If TS (T) < W-TS (Q) , that is, the transaction is trying to overwrite a newer value, transaction T is not rolled back as in the timestamp method.