Why you should not compress your data files
- Transfer
- Tutorial
One of my biggest concerns is data file compression. Despite the fact that I owned the compression code when I worked at Microsoft, I had no chance to rewrite it in such a way as to make it more enjoyable. I really don't like compression.
Please do not confuse transaction log compression with data file compression. Log compression is necessary if your log has grown beyond acceptable limits, or when getting rid of excessive fragmentation of virtual log files (see here (English) and here (English) wonderful Kimberly articles). However, transaction log compression should be a rare operation and should never be part of any regular maintenance program that you run.
Data file compression should be done even less frequently, if at all. And here's why - compressing data files causes severe fragmentation of indexes. Let me demonstrate this with a simple script that you can execute yourself. The script below will create a data file, create a 10MB “fill” table at the beginning of the data file, create a “production” 10MB cluster index, and then analyze the fragmentation of the new cluster index.
The logical fragmentation of the cluster index before compression is close to ideal 0.4%.
Now I will delete the fill table, start the compression to free up space and check the fragmentation of the cluster index again:
Wow! After compression, logical fragmentation is almost 100%. The compression operation completely fragmented the index, depriving any chance of efficiently scanning ranges in that index by ensuring that all forward-looking I / O operations with range scanning are single-page I / O operations.
Why did this happen? The data file compression operation works with one file at a time and uses the global distribution map (GAM) (see the article “Inside the Storage Engine: GAM, SGAM, PFS and other distribution cards”), English) to find the latest page in the file. She then moves this page as close to the beginning of the file as possible, and again and again repeats such an operation. In the situation above, this completely reversed the order of the cluster index, making it completely defragmented, completely fragmented.
The same code is used in the DBCC SHRINKFILE, DBCC SHRINKDATABASE commands, and they are equally bad with auto-compression. And along with index fragmentation, data file compression generates a large number of input / output operations, actively uses processor time and generates a large number of entries in the transaction log - since everything that it does is fully logged.
Compressing data files should never be part of regular maintenance, and you NEVER, NEVER enable auto-compression. I tried to get him excluded from SQL Server 2005 and SQL Server 2008 when I was in a position to achieve this - the only reason why it still exists is to ensure backward compatibility. Do not fall into the trap of creating a maintenance plan that rebuilds all indexes and then tries to free up the space occupied by rebuilding the indexes by starting compression - this is a zero-sum game, where all you do is generate transaction log entries with zero real benefit for performance.
So when you may needstart compression? For example, if you deleted most of a very large database and the database is unlikely to grow, or if you need to clear the file before deleting it?
I recommend the following method:
In fact, you need to provide additional free space before you can compress old files, but this is a much cleaner mechanism.
If you have absolutely no choice and you need to start the file compression operation, be prepared for the fact that you will cause index fragmentation and you should take actions to remove it later if it causes performance problems. The only way to remove index fragmentation without growing the data file is to use DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE. These commands require an additional one page size of 8Kb, instead of having to build a completely new index in the event of a rebuild operation.
Bottom line - try to avoid starting file compression at all costs!
Please do not confuse transaction log compression with data file compression. Log compression is necessary if your log has grown beyond acceptable limits, or when getting rid of excessive fragmentation of virtual log files (see here (English) and here (English) wonderful Kimberly articles). However, transaction log compression should be a rare operation and should never be part of any regular maintenance program that you run.
Data file compression should be done even less frequently, if at all. And here's why - compressing data files causes severe fragmentation of indexes. Let me demonstrate this with a simple script that you can execute yourself. The script below will create a data file, create a 10MB “fill” table at the beginning of the data file, create a “production” 10MB cluster index, and then analyze the fragmentation of the new cluster index.
USE [master];
GO
IF DATABASEPROPERTYEX (N'DBMaint2008', N'Version') IS NOT NULL
DROP DATABASE [DBMaint2008];
GO
CREATE DATABASE DBMaint2008;
GO
USE [DBMaint2008];
GO
SET NOCOUNT ON;
GO
-- Создаем таблицу-"наполнитель" размером 10Мб в начале файла данных
CREATE TABLE [FillerTable] (
[c1] INT IDENTITY,
[c2] CHAR (8000) DEFAULT 'filler');
GO
-- Заполняем таблицу-наполнитель
INSERT INTO [FillerTable] DEFAULT VALUES;
GO 1280
-- Создаем производственную таблицу, которая будет "после" наполнителя в файле данных
CREATE TABLE [ProdTable] (
[c1] INT IDENTITY,
[c2] CHAR (8000) DEFAULT 'production');
CREATE CLUSTERED INDEX [prod_cl] ON [ProdTable] ([c1]);
GO
INSERT INTO [ProdTable] DEFAULT VALUES;
GO 1280
-- Проверяем фрагментацию производственной таблицы
SELECT
[avg_fragmentation_in_percent]
FROM sys.dm_db_index_physical_stats (
DB_ID (N'DBMaint2008'), OBJECT_ID (N'ProdTable'), 1, NULL, 'LIMITED');
GOavg_fragmentation_in_percent
-----------------------------
0.390625The logical fragmentation of the cluster index before compression is close to ideal 0.4%.
Now I will delete the fill table, start the compression to free up space and check the fragmentation of the cluster index again:
--Удаляем таблицу-наполнитель, создавая 10 Мб свободного пространства в начале файла данных
DROP TABLE [FillerTable];
GO
-- Сжимаем базу данных
DBCC SHRINKDATABASE ([DBMaint2008]);
GO
-- Снова проверяем фрагментацию индекса
SELECT
[avg_fragmentation_in_percent]
FROM sys.dm_db_index_physical_stats (
DB_ID (N'DBMaint2008'), OBJECT_ID (N'ProdTable'), 1, NULL, 'LIMITED');
GODbId FileId CurrentSize MinimumSize UsedPages EstimatedPages
----- ------- ------------ ------------ ---------- ---------------
6 1 1456 152 1448 1440
6 2 63 63 56 56
DBCC execution completed. If DBCC printed error messages, contact your system administrator.
avg_fragmentation_in_percent
-----------------------------
99.296875
Wow! After compression, logical fragmentation is almost 100%. The compression operation completely fragmented the index, depriving any chance of efficiently scanning ranges in that index by ensuring that all forward-looking I / O operations with range scanning are single-page I / O operations.
Why did this happen? The data file compression operation works with one file at a time and uses the global distribution map (GAM) (see the article “Inside the Storage Engine: GAM, SGAM, PFS and other distribution cards”), English) to find the latest page in the file. She then moves this page as close to the beginning of the file as possible, and again and again repeats such an operation. In the situation above, this completely reversed the order of the cluster index, making it completely defragmented, completely fragmented.
The same code is used in the DBCC SHRINKFILE, DBCC SHRINKDATABASE commands, and they are equally bad with auto-compression. And along with index fragmentation, data file compression generates a large number of input / output operations, actively uses processor time and generates a large number of entries in the transaction log - since everything that it does is fully logged.
Compressing data files should never be part of regular maintenance, and you NEVER, NEVER enable auto-compression. I tried to get him excluded from SQL Server 2005 and SQL Server 2008 when I was in a position to achieve this - the only reason why it still exists is to ensure backward compatibility. Do not fall into the trap of creating a maintenance plan that rebuilds all indexes and then tries to free up the space occupied by rebuilding the indexes by starting compression - this is a zero-sum game, where all you do is generate transaction log entries with zero real benefit for performance.
So when you may needstart compression? For example, if you deleted most of a very large database and the database is unlikely to grow, or if you need to clear the file before deleting it?
I recommend the following method:
- Create a new filegroup
- Move all the tables and indexes involved to the new filegroup using the CREATE INDEX ... WITH syntax ... (DROP_EXISTING = ON) ON to move the tables and remove fragmentation from them at the same time.
- Delete the old filegroup that you were going to compress anyway (or compress it to the maximum if this is the primary filegroup)
In fact, you need to provide additional free space before you can compress old files, but this is a much cleaner mechanism.
If you have absolutely no choice and you need to start the file compression operation, be prepared for the fact that you will cause index fragmentation and you should take actions to remove it later if it causes performance problems. The only way to remove index fragmentation without growing the data file is to use DBCC INDEXDEFRAG or ALTER INDEX ... REORGANIZE. These commands require an additional one page size of 8Kb, instead of having to build a completely new index in the event of a rebuild operation.
Bottom line - try to avoid starting file compression at all costs!