How to replace old indexes and not break the system?
- Tutorial
Perhaps, many faced with the historical situation over the years before the appearance on the project, a situation where all possible indexes with all include'ami were created on the table. I saw the index on the inherited database that contained all the fields in the table. At the same time, it is not always possible to quickly change indexes, since a guarantee is often needed that changes will not affect the system’s performance.
As the volume of the table grows, it becomes excruciatingly painful for an aimlessly taken place, but you just won’t kill the index anymore, and usage statistics show that the index is being used.
The described example of consolidation of indexes on a highly loaded database operating 24/7. The application uses only stored procedures. MS SQL Server 2012 SP3 version.
Cluster index source table:
Nonclustered Indexes:
The 1st index of 14 fields in the table contains 9, and the second contains 10.
These 2 indexes total up to 180 GB on each server, 12 servers. This is frustrating and worrying, since indexes with matching fields, as well as 6 included fields in everyone. In addition, sometimes the server selects these indexes in cases where it would be more efficient to use a clustered index, which required recompilation of the procedure. After recompilation, the cluster index was already used and the CPU load was reduced.
Information about the use of indexes is collected from servers using the ClientFile table.
In the described case, both indices are used. If some index is not used, this makes the task easier and often you can simply delete it. These statistics are cleared upon restarting MS SQL Server. When deciding whether to delete an index based on it, you need to make sure that the database does not have any very important report that is considered once a month and uses this index.
To do this, use the spGetPlanUsingIndex procedure (the main query is taken from Jonathan Kehayias articles www.sqlskills.com/blogs/jonathan ), which adds up usage statistics to a table. A job is configured to start the procedure for collecting statistics once every 1 hour.
An important point - not all plans will be in the cache, for example, procedures with the RECOMPILE hint. If such a hint is used, the procedures must be checked and a plan developed for them and included in the analysis.
In this case, the analysis showed that both indices are used by the same procedures, that is, the procedure uses, then one index is then another, for the same query.
There are 24 procedures in total that use these indexes. For each procedure, it was analyzed by which fields the data is filtered, JOIN is made and which fields are specified in SELECT.
This was all done manually in an excel table. Now I understand that it was possible to avoid so much manual labor by writing a query with a selection of this from the xml plan. Fields for selection: seek predicate, predicate and which fields are selected from the table.
Based on the data obtained in the analysis of the use of indices in the plans, a new index is formed.
On the project, when making changes, load testing is necessarily done. The load test with the new index showed that you need to add the Name field to the index in order to remove the Key lookup. The field is added because it is used in the WHERE clause.
Also, after the test, information in the missing index is checked.
Final index
The table is 700 million rows, so the new index was applied by creating a new table and migrating data to it.
Save up to 45% disk space on each server. The server often uses a clustered index, which reduces the number of key lookups in the plans.
As the volume of the table grows, it becomes excruciatingly painful for an aimlessly taken place, but you just won’t kill the index anymore, and usage statistics show that the index is being used.
The described example of consolidation of indexes on a highly loaded database operating 24/7. The application uses only stored procedures. MS SQL Server 2012 SP3 version.
Cluster index source table:
CREATE TABLE [dbo].[ClientFile](
[StorageId] [int] NOT NULL,
[FolderId] [int] NOT NULL,
[ClientFileInternalId] [int] IDENTITY(1,1) NOT NULL,
[FileName] [nvarchar](900) NOT NULL,
[FileExtension] [nvarchar](10) NOT NULL,
[FileClientVersionId] [smallint] NOT NULL,
[ClientFileVersionId] [int] NULL,
[FileInternalId] [bigint] NOT NULL,
[FileLength] [bigint] NOT NULL,
[OrderId] [tinyint] NOT NULL,
[FileFileExtensionId] [int] NULL,
[FileStatus] [tinyint] NOT NULL,
[DirectoryVersionId] [int] NOT NULL,
[DateDeleted] [datetime] NULL,
CONSTRAINT [PK_ClientFile] PRIMARY KEY CLUSTERED
(
[StorageId] ASC,
[ClientFileInternalId] ASC
)
)
Nonclustered Indexes:
CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FileStatus] ON [dbo].[ClientFile]
(
[StorageId] ASC,
[FileStatus] ASC,
[OrderId] DESC
)
INCLUDE (
[ClientFileInternalId],
[FolderId],
[DirectoryVersionId],
[FileInternalId],
[FileClientVersionId],
[FileLength]);
CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FolderId_FileStatus_FileName] ON [dbo].[ClientFile]
(
[StorageId] ASC,
[FolderId] ASC,
[FileStatus] ASC,
[FileName] ASC
)
INCLUDE (
[DateDeleted],
[DirectoryVersionId],
[FileExtension],
[FileInternalId],
[FileClientVersionId],
[FileLength]);
The 1st index of 14 fields in the table contains 9, and the second contains 10.
These 2 indexes total up to 180 GB on each server, 12 servers. This is frustrating and worrying, since indexes with matching fields, as well as 6 included fields in everyone. In addition, sometimes the server selects these indexes in cases where it would be more efficient to use a clustered index, which required recompilation of the procedure. After recompilation, the cluster index was already used and the CPU load was reduced.
Step 1. Analysis of index usage statistics
Information about the use of indexes is collected from servers using the ClientFile table.
Table Index Usage Statistics
declare @dbid int
select @dbid = db_id()
select (cast((user_seeks + user_scans + user_lookups) as float) / case user_updates when 0 then 1.0 else cast(user_updates as float) end) * 100 as [%]
, (user_seeks + user_scans + user_lookups) AS total_usage
, objectname=object_name(s.object_id), s.object_id
, indexname=i.name, i.index_id
, user_seeks, user_scans, user_lookups, user_updates
, last_user_seek, last_user_scan, last_user_update
, last_system_seek, last_system_scan, last_system_update
, 'DROP INDEX ' + i.name + ' ON ' + object_name(s.object_id) as [Command]
from sys.dm_db_index_usage_stats s,
sys.indexes i
where database_id = @dbid
and objectproperty(s.object_id,'IsUserTable') = 1
and i.object_id = s.object_id
and i.index_id = s.index_id
and i.name not like 'PK_%'
and object_name(s.object_id) = 'ClientFile'
order by [%] asc
In the described case, both indices are used. If some index is not used, this makes the task easier and often you can simply delete it. These statistics are cleared upon restarting MS SQL Server. When deciding whether to delete an index based on it, you need to make sure that the database does not have any very important report that is considered once a month and uses this index.
Step 2. Select all plans from the cache that use the described 2 indexes.
To do this, use the spGetPlanUsingIndex procedure (the main query is taken from Jonathan Kehayias articles www.sqlskills.com/blogs/jonathan ), which adds up usage statistics to a table. A job is configured to start the procedure for collecting statistics once every 1 hour.
An important point - not all plans will be in the cache, for example, procedures with the RECOMPILE hint. If such a hint is used, the procedures must be checked and a plan developed for them and included in the analysis.
Fetching plans from the cache with the specified index
CREATE TABLE [dbo].[LogDataFileIndexUsage](
[PlanId] [int] IDENTITY(1,1) NOT NULL,
[PlanDate] [datetime] NOT NULL CONSTRAINT [DF_LogDataFileIndexUsage_PlanDate] DEFAULT (getutcdate()),
[DBname] [varchar](50) NULL,
[SPname] [varchar](256) NULL,
[ScanCount] [int] NULL,
[SeekCount] [int] NULL,
[UpdateCount] [int] NULL,
[RefCount] [int] NULL,
[UseCount] [int] NULL,
[QueryPlan] [xml] NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
SET ANSI_PADDING OFF
ALTER TABLE [dbo].[LogDataFileIndexUsage] ADD [IndexName] [varchar](256) NULL
PRIMARY KEY CLUSTERED
(
[PlanId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE PROCEDURE [dbo].[spGetPlanUsingIndex]
@indexName NVARCHAR(128)
AS
BEGIN
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
-- Make sure the name passed is appropriately quoted
IF (LEFT(@IndexName, 1) <> '[' AND RIGHT(@IndexName, 1) <> ']') SET @IndexName = QUOTENAME(@IndexName);
WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
INSERT INTO [dbo].[LogDataFileIndexUsage]
(
DBName,
SPname,
ScanCount,
SeekCount,
UpdateCount,
RefCount,
UseCount,
QueryPlan,
IndexName
)
SELECT
DB_NAME(E.dbid) AS [DBName],
object_name(E.objectid, dbid) AS [ObjectName],
E.query_plan.value('count(//RelOp[@LogicalOp = ''Index Scan'' or @LogicalOp = ''Clustered Index Scan'']/*/Object[@Index=sql:variable("@IndexName")])','int') AS [ScanCount],
E.query_plan.value('count(//RelOp[@LogicalOp = ''Index Seek'' or @LogicalOp = ''Clustered Index Seek'']/*/Object[@Index=sql:variable("@IndexName")])','int') AS [SeekCount],
E.query_plan.value('count(//Update/Object[@Index=sql:variable("@IndexName")])','int') AS [UpdateCount],
P.refcounts AS [RefCounts],
P.usecounts AS [UseCounts],
E.query_plan AS [QueryPlan],
@IndexName
FROM sys.dm_exec_cached_plans P
CROSS APPLY sys.dm_exec_query_plan(P.plan_handle) E
WHERE
E.query_plan.exist('//*[@Index=sql:variable("@IndexName")]') = 1
OPTION(MAXDOP 1, RECOMPILE);
END
GO
Step 3. Analysis of the collected data
In this case, the analysis showed that both indices are used by the same procedures, that is, the procedure uses, then one index is then another, for the same query.
There are 24 procedures in total that use these indexes. For each procedure, it was analyzed by which fields the data is filtered, JOIN is made and which fields are specified in SELECT.
This was all done manually in an excel table. Now I understand that it was possible to avoid so much manual labor by writing a query with a selection of this from the xml plan. Fields for selection: seek predicate, predicate and which fields are selected from the table.
Step 4. Formation of a new index
Based on the data obtained in the analysis of the use of indices in the plans, a new index is formed.
CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FolderId_FileStatus] ON [dbo].[ClientFile]
([StorageId],
[FolderId],
[FileStatus]);
Step 5. Testing
On the project, when making changes, load testing is necessarily done. The load test with the new index showed that you need to add the Name field to the index in order to remove the Key lookup. The field is added because it is used in the WHERE clause.
Also, after the test, information in the missing index is checked.
Missing index
SELECT
mig.index_group_handle, mid.index_handle,
CONVERT (decimal (28,1),
migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)
) AS improvement_measure,
'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '')
+ ' with (online=on)' AS create_index_statement,
migs.*, mid.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10 --AND database_id = 12
AND mid.statement like ('%[ClientFile]')
ORDER BY convert(varchar(10), last_user_seek, 120) desc, migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) /*last_user_seek*/ DESC
Final index
CREATE NONCLUSTERED INDEX [IX_ClientFile_StorageId_FolderId_FileStatus_FileName] ON [dbo].[ClientFile]
([StorageId],
[FolderId],
[FileStatus],
[FileName]);
Step 6. Apply the changes
The table is 700 million rows, so the new index was applied by creating a new table and migrating data to it.
Save up to 45% disk space on each server. The server often uses a clustered index, which reduces the number of key lookups in the plans.