Delayed Durability or the story of how it turned out to speed up the execution of autotests from 11 to 2.5 minutes

  • Tutorial


Relatively recently, I began to help on a new product project for myself, which is actually being developed as a free web service for tracking working hours.

The technology stack was initially selected to be small, and SQL Server 2014 Express was used as the data store . One of the first tasks I was given was to investigate the possibility of speeding up autotests.

Before I got involved in the work, the project existed for a long time and already managed to get a fair amount of tests (at that time I counted 1300 autotests). On a build machine with SSD, tests were performed in 4-5 minutes, and on a regular HDD, more than 11-12 minutes. Could put the whole team SSD, but this would not solve the essence of the problem, especially since in the near future it was planned to expand the functionality and there would be even more tests.

All tests were grouped and before each of the groups was executed, the database was cleared of old data. Cleaning used to be through re-creating the database, but this approach was very slow in practice. It was much faster to simply clear all tables of data and reset the IDENTITY value to zero so that subsequent inserts form the correct test data. The script with this approach was my starting point:

EXEC sys.sp_msforeachtable 'ALTER TABLE ? NOCHECK CONSTRAINT ALL' 
DELETE FROM [dbo].[Project] 
DBCC CHECKIDENT('[dbo].[Project]', RESEED, 0) 
DBCC CHECKIDENT('[dbo].[Project]', RESEED) 
DELETE FROM [dbo].[RecentWorkTask] 
... 
EXEC sys.sp_msforeachtable 'ALTER TABLE ? WITH CHECK CHECK CONSTRAINT ALL' 

The first thing that caught my eye was the shutdown and re-inclusion of all the constants on the base. This behavior was intended to speed up the delete operation, but in fact it turned out to be a resource-intensive operation for the server. I decided to abandon the use of this approach.

Also, the idea immediately appeared to use dynamic SQL to form a query to delete data. For example, if the table has foreign keys, then you can use the DELETE operation as before , otherwise you can delete data with minimal logging using the TRUNCATE command .

As a result, the request to delete data took the following form:

DECLARE @SQL NVARCHAR(MAX) = ''
SELECT @SQL = (
    SELECT
        CASE WHEN p.[object_id] IS NOT NULL
            THEN
                CASE WHEN f.referenced_object_id IS NULL
                    THEN N'TRUNCATE TABLE '
                    ELSE N'DELETE FROM '
                END + obj_name
            ELSE ''
        END + CHAR(13) + CHAR(10) +
        CASE WHEN has_identity > 0
            THEN N'DBCC CHECKIDENT('''+ obj_name + N''', RESEED, 0) WITH NO_INFOMSGS'
            ELSE ''
        END + CHAR(13) + CHAR(10)
    FROM (
        SELECT obj_name = QUOTENAME(s.name) + '.' + QUOTENAME(o.name)
             , o.[object_id]
             , has_identity = IdentityProperty(o.[object_id], 'LastValue')
        FROM sys.objects o
        JOIN sys.schemas s ON o.[schema_id] = s.[schema_id]
        WHERE o.is_ms_shipped = 0
            AND o.[type] = 'U'
    ) t
    LEFT JOIN (
        SELECT DISTINCT [object_id]
        FROM sys.partitions
        WHERE [rows] > 0
            AND index_id IN (0, 1)
    ) p ON t.[object_id] = p.[object_id]
    LEFT JOIN (
        SELECT DISTINCT referenced_object_id
        FROM sys.foreign_keys
    ) f ON f.referenced_object_id = t.[object_id]
    FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')
PRINT @SQL
--EXEC sys.sp_executesql @SQL

And if autotests were initially run for 11 minutes on my machine:



After I threw ALTER statements and rewrote the request, all tests started to run 40 seconds faster:



Of course, we could be happy and set the status to Resolved for the task quiet , but the main problem remained :



During the tests, the disk was very heavily loaded. I decided to see what are the expectations on the server. To do this, first cleaned sys.dm_os_wait_stats :

DBCC SQLPERF("sys.dm_os_wait_stats", CLEAR)

Re-started the execution of autotests and after fulfilled the request:

SELECT TOP(20)
      wait_type
    , wait_time = wait_time_ms / 1000.
    , wait_resource = (wait_time_ms - signal_wait_time_ms) / 1000.
    , wait_signal = signal_wait_time_ms / 1000.
    , waiting_tasks_count
    , percentage = 100.0 * wait_time_ms / SUM(wait_time_ms) OVER ()
    , avg_wait = wait_time_ms / 1000. / waiting_tasks_count
    , avg_wait_resource = (wait_time_ms - signal_wait_time_ms) / 1000. / [waiting_tasks_count]
    , avg_wait_signal = signal_wait_time_ms / 1000.0 / waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE [waiting_tasks_count] > 0
    AND max_wait_time_ms > 0
    AND [wait_type] NOT IN (
        N'BROKER_EVENTHANDLER', N'BROKER_RECEIVE_WAITFOR',
        N'BROKER_TASK_STOP', N'BROKER_TO_FLUSH',
        N'BROKER_TRANSMITTER', N'CHECKPOINT_QUEUE',
        N'CHKPT', N'CLR_AUTO_EVENT',
        N'CLR_MANUAL_EVENT', N'CLR_SEMAPHORE',
        N'DBMIRROR_DBM_EVENT', N'DBMIRROR_EVENTS_QUEUE',
        N'DBMIRROR_WORKER_QUEUE', N'DBMIRRORING_CMD',
        N'DIRTY_PAGE_POLL', N'DISPATCHER_QUEUE_SEMAPHORE',
        N'EXECSYNC', N'FSAGENT',
        N'FT_IFTS_SCHEDULER_IDLE_WAIT', N'FT_IFTSHC_MUTEX',
        N'HADR_CLUSAPI_CALL', N'HADR_FILESTREAM_IOMGR_IOCOMPLETION',
        N'HADR_LOGCAPTURE_WAIT', N'HADR_NOTIFICATION_DEQUEUE',
        N'HADR_TIMER_TASK', N'HADR_WORK_QUEUE',
        N'KSOURCE_WAKEUP', N'LAZYWRITER_SLEEP',
        N'LOGMGR_QUEUE', N'ONDEMAND_TASK_QUEUE',
        N'PWAIT_ALL_COMPONENTS_INITIALIZED',
        N'QDS_PERSIST_TASK_MAIN_LOOP_SLEEP',
        N'QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP',
        N'REQUEST_FOR_DEADLOCK_SEARCH', N'RESOURCE_QUEUE',
        N'SERVER_IDLE_CHECK', N'SLEEP_BPOOL_FLUSH',
        N'SLEEP_DBSTARTUP', N'SLEEP_DCOMSTARTUP',
        N'SLEEP_MASTERDBREADY', N'SLEEP_MASTERMDREADY',
        N'SLEEP_MASTERUPGRADED', N'SLEEP_MSDBSTARTUP',
        N'SLEEP_SYSTEMTASK', N'SLEEP_TASK',
        N'SLEEP_TEMPDBSTARTUP', N'SNI_HTTP_ACCEPT',
        N'SP_SERVER_DIAGNOSTICS_SLEEP', N'SQLTRACE_BUFFER_FLUSH',
        N'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        N'SQLTRACE_WAIT_ENTRIES', N'WAIT_FOR_RESULTS',
        N'WAITFOR', N'WAITFOR_TASKSHUTDOWN',
        N'WAIT_XTP_HOST_WAIT', N'WAIT_XTP_OFFLINE_CKPT_NEW_LOG',
        N'WAIT_XTP_CKPT_CLOSE', N'XE_DISPATCHER_JOIN',
        N'XE_DISPATCHER_WAIT', N'XE_TIMER_EVENT'
    )
ORDER BY [wait_time_ms] DESC

The greatest delay is observed with WRITELOG .

wait_type               wait_time    waiting_tasks_count  percentage 
----------------------- ------------ -------------------- -----------
WRITELOG                546.798      60261                96.07      
PAGEIOLATCH_EX          13.151       96                   2.31       
PAGELATCH_EX            5.768        46097                1.01       
PAGEIOLATCH_UP          1.243        86                   0.21       
IO_COMPLETION           1.158        89                   0.20       
MEMORY_ALLOCATION_EXT   0.480        683353               0.08       
LCK_M_SCH_S             0.200        34                   0.03       
ASYNC_NETWORK_IO        0.115        688                  0.02       
LCK_M_S                 0.082        10                   0.01       
PAGEIOLATCH_SH          0.052        1                    0.00       
PAGELATCH_UP            0.037        6                    0.00       
SOS_SCHEDULER_YIELD     0.030        3598                 0.00       

We open the "encyclopedia of expectations" by Paul Randal and find WRITELOG there while checking the MSDN :

This is the log management system waiting for a log flush to disk. It commonly indicates that the I / O subsystem can't keep up with the log flush volume, but on very high-volume systems it could also be caused by internal log flush limits, that may mean you have to split your workload over multiple databases or even make your transactions a little longer to reduce log flushes. To be sure it's the I / O subsystem, use the DMV sys.dm_io_virtual_file_stats to examine the I / O latency for the log file and see if it correlates to the average WRITELOG time. If WRITELOG is longer, you've got internal contention and need to shard. If not, investigate why you're creating so much transaction log.

The log management subsystem is waiting for the log to be written to disk. As a rule, it means that the disk system cannot provide a timely record of the entire log volume, but on high-loaded systems this may be caused by general limitations of the log recording / This may mean that you should share the load between several databases, or even make your transactions a little more long to reduce the number of log entries to disk. To verify that the cause is in the I / O system, use DMV sys.dm_io_virtual_file_stats to examine the I / O delay for the log file and see if it matches the WRITELOG delay time . If WRITELOGlasts longer, you have received internal competition for writing to disk and must share the load. If not, find out why you are creating such a large transaction log.

So what do I still need to find out? Yes, each autotest writes something to the database during execution. As a solution to the problem with WRITELOG expectations , it would be possible to insert data not in rows, but in larger chunks. But in SQL Server 2014 , a new option appeared at the base level of the Delayed Durability , i.e. the ability to not flush data to disk immediately when committing a transaction.

How is data modified in SQL Server ? Suppose we are inserting a new line. SQL Server calls componentThe Storage Engine , in turn, turns to the Buffer Manager (which works with buffers in memory and disk) and says that I want to change the data.

After that, the Buffer Manager accesses the Buffer Pool (a cache in memory for all our data, which stores information per page - 8Kb per page) and then modifies the necessary pages in memory. If these pages are not there, then it will load them from a disk. At the moment the page in memory has changed, SQL Server cannot yet say that the query is complete. Otherwise, one of the principles of ACID would be violated , namely Durability , when at the end of the modification it is guaranteed that the data will be written to disk.

After modifying the page in memory, the Storage Engine calls the Log Manager , which writes data to the log. But he does not do it right away, but through Log Buffer , which is 60Kb in size and is used to optimize performance when working with the log. Data dumping from the buffer to the log file occurs when:

  • The buffer is full and the data is saved in the log.
  • User executed sys.sp_flush_log .
  • A transaction occurred and everything that was in the Log Buffer was logged.

When the data has been saved in the log, it is confirmed that the data has been modified and SQL Server informs the client about it.

According to this logic, you can notice that the data does not fall into the data file. SQL Server uses an asynchronous mechanism to write to data files. There are two such mechanisms:

1) Lazy Writer , which runs periodically, checks if there is enough memory for SQL Server . If not, then the pages are erased from memory and written to the data file. And those that have been changed are flushed to disk and ejected from memory.

2) Checkpoint , which scans dirty pages about once a minute, flushes them to disk and leaves them in memory.

Suppose a situation where a bunch of small transactions occur in the system that, for example, modify data line by line. After each change, the data goes from the Log Buffer to the transaction log. Remember that all changes get to the log synchronously and other transactions have to wait in line.

Let me show you an example:

USE [master]
GO
SET NOCOUNT ON
IF DB_ID('TT') IS NOT NULL BEGIN
    ALTER DATABASE TT SET SINGLE_USER WITH ROLLBACK IMMEDIATE
    DROP DATABASE TT
END
GO
CREATE DATABASE TT
GO
ALTER DATABASE TT
    MODIFY FILE (NAME = N'TT', SIZE = 25MB, FILEGROWTH = 5MB)
GO
ALTER DATABASE TT
    MODIFY FILE (NAME = N'TT_log', SIZE = 25MB, FILEGROWTH = 5MB)
GO
USE TT
GO
CREATE TABLE dbo.tbl (
      a INT IDENTITY PRIMARY KEY
    , b INT
    , c CHAR(2000)
)
GO
IF OBJECT_ID('tempdb.dbo.#temp') IS NOT NULL
    DROP TABLE #temp
GO
SELECT t.[file_id], t.num_of_writes, t.num_of_bytes_written
INTO #temp
FROM sys.dm_io_virtual_file_stats(DB_ID(), NULL) t
DECLARE @WaitTime BIGINT
      , @WaitTasks BIGINT
      , @StartTime DATETIME = GETDATE()
      , @LogRecord BIGINT = (
              SELECT COUNT_BIG(*)
              FROM sys.fn_dblog(NULL, NULL)
          )
SELECT @WaitTime = wait_time_ms
     , @WaitTasks = waiting_tasks_count
FROM sys.dm_os_wait_stats
WHERE [wait_type] = N'WRITELOG'
DECLARE @i INT = 1
WHILE @i < 5000 BEGIN
    INSERT INTO dbo.tbl (b, c)
    VALUES (@i, 'text')
    SELECT @i += 1
END
SELECT elapsed_seconds = DATEDIFF(MILLISECOND, @StartTime, GETDATE()) * 1. / 1000
     , wait_time = (wait_time_ms - @WaitTime) / 1000.
     , waiting_tasks_count = waiting_tasks_count - @WaitTasks
     , log_record = (
          SELECT COUNT_BIG(*) - @LogRecord
          FROM sys.fn_dblog(NULL, NULL)
       )
FROM sys.dm_os_wait_stats
WHERE [wait_type] = N'WRITELOG'
SELECT [file] = FILE_NAME(o.[file_id])
     , num_of_writes = t.num_of_writes - o.num_of_writes
     , num_of_mb_written = (t.num_of_bytes_written - o.num_of_bytes_written) * 1. / 1024 / 1024
FROM #temp o
CROSS APPLY sys.dm_io_virtual_file_stats(DB_ID(), NULL) t
WHERE o.[file_id] = t.[file_id]

The insertion of 5 thousand lines took approximately 42.5 seconds, while the delays during insertion into the log amounted to 42 seconds.

elapsed_seconds  wait_time  waiting_tasks_count  log_record
---------------- ---------- -------------------- -----------
42.54            42.13      5003                 18748

Physically, the SQL Server accessed the log 5,000 times and recorded a total of 20MB.

file    num_of_writes  num_of_mb_written
------- -------------- ------------------
TT      79             8.72
TT_log  5008           19.65

It is for such situations that you need Delayed Durability . When it is turned on, logging only occurs when the Log Buffer is full. You can enable Delayed Durability for the entire database:

ALTER DATABASE TT SET DELAYED_DURABILITY = FORCED
GO

or for individual transactions:

ALTER DATABASE TT SET DELAYED_DURABILITY = ALLOWED
GO
BEGIN TRANSACTION t
...
COMMIT TRANSACTION t WITH (DELAYED_DURABILITY = ON)

Turn on at the base level and re-execute the script.

Expectations disappeared and the request worked on my machine for 170ms:

elapsed_seconds  wait_time  waiting_tasks_count  log_record
---------------- ---------- -------------------- -----------
0.17             0.00       0                    31958

Due to the fact that logging was less intense:

file    num_of_writes  num_of_mb_written
------- -------------- ------------------
TT      46             9.15
TT_log  275            12.92

Of course there is a fly in the ointment. Even before the data physically got into the log file, the client is already notified that the changes are committed. In the event of a failure, we may lose data equal to the size of the buffer and beat the base.

In my case, the safety of test data was not required. For the test base on which autotests were running, set DELAYED_DURABILITY to FORCED and the next run of all tests worked in 2.5 minutes.



And while all the delays associated with logging began to have a minimal impact on performance:

wait_type            wait_time   waiting_tasks_count  percentage 
-------------------- ----------- -------------------- -----------
PAGEIOLATCH_EX       16.031      61                   43.27      
WRITELOG             15.454      787                  41.72      
PAGEIOLATCH_UP       2.210       36                   5.96       
PAGEIOLATCH_SH       1.472       2                    3.97       
LCK_M_SCH_M          0.756       9                    2.04       
ASYNC_NETWORK_IO     0.464       735                  1.25       
PAGELATCH_UP         0.314       8                    0.84       
SOS_SCHEDULER_YIELD  0.154       2759                 0.41       
PAGELATCH_EX         0.154       44785                0.41       
LCK_M_SCH_S          0.021       7                    0.05       
PAGELATCH_SH         0.011       378                  0.02       

To summarize the Delayed Durability :

  • Available in all editions starting with SQL Server 2014 .
  • It can be used if you have a bottleneck when writing to the transaction log (delayed writing in large blocks can be more effective than many small ones).
  • Simultaneously executed transactions are less likely to compete for IO operations when writing to the transaction log.
  • When it is turned on, the COMMIT operation does not wait for writing to the transaction log and we can get a significant performance increase in OLTP systems.
  • You can safely turn on Delayed Durability if you are ready to play Russian roulette and, with a “successful” set of circumstances, lose about 60Kb of data in the event of a failure.

UPDATE 2016-07-12: Data removal script made more functional
DECLARE @SQL NVARCHAR(MAX)
      , @FK_TurnOff NVARCHAR(MAX)
      , @FK_TurnOn NVARCHAR(MAX)
SELECT @SQL = (
    SELECT CHAR(13) + CHAR(10) +
        IIF(p.[rows] > 0,
            IIF(t2.referenced_object_id IS NULL, N'TRUNCATE TABLE ', N'DELETE FROM ') + obj_name,
            ''
        ) + CHAR(13) + CHAR(10) +
        IIF(IdentityProperty(t.[object_id], 'LastValue') > 0,
            N'DBCC CHECKIDENT('''+ obj_name + N''', RESEED, 0) WITH NO_INFOMSGS',
            ''
        )
    FROM (
        SELECT obj_name = QUOTENAME(s.name) + '.' + QUOTENAME(o.name), o.[object_id]
        FROM sys.objects o
        JOIN sys.schemas s ON o.[schema_id] = s.[schema_id]
        WHERE o.is_ms_shipped = 0
            AND o.[type] = 'U'
            AND o.name != N'__MigrationHistory'
    ) t
    JOIN sys.partitions p ON t.[object_id] = p.[object_id] AND p.index_id IN (0, 1)
    LEFT JOIN (
        SELECT DISTINCT f.referenced_object_id
        FROM sys.foreign_keys f
    ) t2 ON t2.referenced_object_id = t.[object_id]
    FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)')
SELECT @FK_TurnOff = CAST(x.query('off/text()') AS NVARCHAR(MAX))
     , @FK_TurnOn = CAST(x.query('on/text()') AS NVARCHAR(MAX))
FROM (
    SELECT [off] = CHAR(10) + 'ALTER TABLE ' + obj + ' NOCHECK CONSTRAINT ' + fk
         , [on] = CHAR(10) + 'ALTER TABLE ' + obj + ' CHECK CONSTRAINT ' + fk
    FROM (
        SELECT fk = QUOTENAME(f.name)
             , obj = QUOTENAME(SCHEMA_NAME(f.[schema_id])) + '.' + QUOTENAME(OBJECT_NAME(f.parent_object_id))
        FROM sys.foreign_keys f
        WHERE f.delete_referential_action = 0
            AND EXISTS(
                    SELECT *
                    FROM sys.partitions p
                    WHERE p.[object_id] = f.parent_object_id
                        AND p.[rows] > 0
                        AND p.index_id IN (0, 1)
                )
    ) t
    FOR XML PATH(''), TYPE
) t(x)
IF @SQL LIKE '%[a-z]%' BEGIN
    SET @SQL = ISNULL(@FK_TurnOff, '') + @SQL + ISNULL(@FK_TurnOn, '')
    PRINT @SQL
    --EXEC sys.sp_executesql @SQL
END


If you want to share this article with an English audience:
Delayed Durability, or The Story of How to Speed ​​Up Autotests From 11 to 2.5 Minutes

Also popular now: