Unpivot
- Tutorial
During my work, I was faced with a wide range of tasks. Some tasks required monotonous work, others came down to pure creativity.
The most interesting tasks that I can now recall, one way or another, touched upon questions of query optimization.
Optimization is, first of all, the search for the optimal query plan. However, what to do in a situation where the standard construction of the language produces a plan that is very far from optimal?
I encountered this kind of problem when I used the UNPIVOT construct to convert columns to rows.
Through a small comparative analysis, a more effective alternative was found for UNPIVOT .
So that the task does not seem abstract, suppose that we have at our disposal a table containing information on the number of medals among users.
Based on this table, we present various methods for converting columns to rows, as well as plans for their implementation.
At one time, SQL Server 2000 did not provide an efficient way to convert columns to rows. As a result, the practice of multiple sampling from the same table, but with a different set of columns combined through the UNION ALL construct, was widely practiced :
A huge minus of this approach is the repeated reading of the data, which significantly reduced the efficiency of such a request.
If you look at the implementation plan, you can easily see this:
With the release of SQL Server 2005 , it became possible to use the new T-SQL language construct - UNPIVOT .
Using UNPIVOT, the previous query can be simplified to:
When executed, we get the following plan:
Starting with SQL Server 2008, it became possible to use the VALUES construct not only to create multi-line INSERT queries, but also inside the FROM block .
Using the VALUES construct , the query above can be rewritten as follows:
At the same time, in comparison with UNPIVOT , the implementation plan will be slightly simplified:
Using dynamic SQL, it is possible to create a “universal” query for any table. The main condition for this is that columns that are not part of the primary key must have compatible data types.
To find out a list of such columns, use the following query:
If you look at the query plan, you will notice that connecting to sys.index_columns is quite expensive: You
can use the INDEX_COL function to get rid of this connection . As a result, the final version of the request will take the following form:
When executed, a request will be generated in accordance with the template:
Even if we take into account the optimizations that we have done, it is worth noting that this method is still slower in comparison with the two previous ones:
A more elegant way to implement dynamic UNPIVOT is to use the following XML trick :
In which for each line XML of the form is formed :
Then the name of each attribute and its value is parsed.
In most cases, using XML results in a slower execution plan - this is the price paid for versatility.
Now compare the results: There is
no dramatic difference in execution speed between UNPIVOT and VALUES . This statement is true when it comes to simply converting columns to rows.
We complicate the task and consider another option, where for each user it is necessary to find out the type of medals, which he has the most.
Let's try to solve the problem using the UNPIVOT construct :
The execution plan shows that the problem is in re-reading data and sorting, which is necessary to organize the data: It
will be easy to get rid of re-reading if you remember that it is allowed to use columns from an external block in a subquery:
Repeated readings were gone, but the sorting operation did not go away:
Let's see how the VALUES construct behaves in this task:
The plan was expectedly simplified, but sorting is still present in the plan:
Let's try to bypass the sorting using the aggregate function:
We got rid of sorting:
Small results:
In a situation where it is necessary to make a simple conversion of columns to rows, it is most preferable to use UNPIVOT or VALUES constructs .
If after the conversion, the obtained data is planned to be used in aggregation or sorting operations, then it is more preferable to use the VALUES construct , which, in most cases, allows to obtain more efficient execution plans.
If the number of columns in the tables is variable, it is recommended to use XML , which, unlike dynamic SQL, can be used inside table functions.
PSTo adapt, part of the examples to SQL Server 2005 features , the VALUES construct :
must be replaced with the combination SELECT UNION ALL SELECT :
UPDATE 10/16/2013 : How do UNPIVOT and VALUES behave on large amounts of data?
Based on a table with the following structure (25 columns total).
This table contains ~ 186000 rows. From a cold start on local SQL Server 2012 SP1 , the operation of converting rows to columns produced the following results. UNPIVOT
execution plan : VALUES
execution plan : In comparison, it can be seen that VALUES is faster (3 seconds), but requires more CPU resources: I’ll add that in each specific situation the difference in performance will vary.
The most interesting tasks that I can now recall, one way or another, touched upon questions of query optimization.
Optimization is, first of all, the search for the optimal query plan. However, what to do in a situation where the standard construction of the language produces a plan that is very far from optimal?
I encountered this kind of problem when I used the UNPIVOT construct to convert columns to rows.
Through a small comparative analysis, a more effective alternative was found for UNPIVOT .
So that the task does not seem abstract, suppose that we have at our disposal a table containing information on the number of medals among users.
IF OBJECT_ID('dbo.UserBadges', 'U') IS NOT NULL
DROP TABLE dbo.UserBadges
GO
CREATE TABLE dbo.UserBadges (
UserID INT
, Gold SMALLINT NOT NULL
, Silver SMALLINT NOT NULL
, Bronze SMALLINT NOT NULL
, CONSTRAINT PK_UserBadges PRIMARY KEY (UserID)
)
INSERT INTO dbo.UserBadges (UserID, Gold, Silver, Bronze)
VALUES
(1, 5, 3, 1),
(2, 0, 8, 1),
(3, 2, 4, 11)
Based on this table, we present various methods for converting columns to rows, as well as plans for their implementation.
1. UNION ALL
At one time, SQL Server 2000 did not provide an efficient way to convert columns to rows. As a result, the practice of multiple sampling from the same table, but with a different set of columns combined through the UNION ALL construct, was widely practiced :
SELECT UserID, BadgeCount = Gold, BadgeType = 'Gold'
FROM dbo.UserBadges
UNION ALL
SELECT UserID, Silver, 'Silver'
FROM dbo.UserBadges
UNION ALL
SELECT UserID, Bronze, 'Bronze'
FROM dbo.UserBadges
A huge minus of this approach is the repeated reading of the data, which significantly reduced the efficiency of such a request.
If you look at the implementation plan, you can easily see this:
2. UNPIVOT
With the release of SQL Server 2005 , it became possible to use the new T-SQL language construct - UNPIVOT .
Using UNPIVOT, the previous query can be simplified to:
SELECT UserID, BadgeCount, BadgeType
FROM dbo.UserBadges
UNPIVOT (
BadgeCount FOR BadgeType IN (Gold, Silver, Bronze)
) unpvt
When executed, we get the following plan:
3. VALUES
Starting with SQL Server 2008, it became possible to use the VALUES construct not only to create multi-line INSERT queries, but also inside the FROM block .
Using the VALUES construct , the query above can be rewritten as follows:
SELECT p.UserID, t.*
FROM dbo.UserBadges p
CROSS APPLY (
VALUES
(Gold, 'Gold')
, (Silver, 'Silver')
, (Bronze, 'Bronze')
) t(BadgeCount, BadgeType)
At the same time, in comparison with UNPIVOT , the implementation plan will be slightly simplified:
4. Dynamic SQL
Using dynamic SQL, it is possible to create a “universal” query for any table. The main condition for this is that columns that are not part of the primary key must have compatible data types.
To find out a list of such columns, use the following query:
SELECT c.name
FROM sys.columns c WITH(NOLOCK)
LEFT JOIN (
SELECT i.[object_id], i.column_id
FROM sys.index_columns i WITH(NOLOCK)
WHERE i.index_id = 1
) i ON c.[object_id] = i.[object_id] AND c.column_id = i.column_id
WHERE c.[object_id] = OBJECT_ID('dbo.UserBadges', 'U')
AND i.[object_id] IS NULL
If you look at the query plan, you will notice that connecting to sys.index_columns is quite expensive: You
can use the INDEX_COL function to get rid of this connection . As a result, the final version of the request will take the following form:
DECLARE @table_name SYSNAME
SELECT @table_name = 'dbo.UserBadges'
DECLARE @SQL NVARCHAR(MAX)
SELECT @SQL = '
SELECT *
FROM ' + @table_name + '
UNPIVOT (
value FOR code IN (
' + STUFF((
SELECT ', [' + c.name + ']'
FROM sys.columns c WITH(NOLOCK)
WHERE c.[object_id] = OBJECT_ID(@table_name)
AND INDEX_COL(@table_name, 1, c.column_id) IS NULL
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '') + '
)
) unpiv'
PRINT @SQL
EXEC sys.sp_executesql @SQL
When executed, a request will be generated in accordance with the template:
SELECT *
FROM
UNPIVOT (
value FOR code IN ()
) unpiv
Even if we take into account the optimizations that we have done, it is worth noting that this method is still slower in comparison with the two previous ones:
5. XML
A more elegant way to implement dynamic UNPIVOT is to use the following XML trick :
SELECT
p.UserID
, BadgeCount = t.c.value('.', 'INT')
, BadgeType = t.c.value('local-name(.)', 'VARCHAR(10)')
FROM (
SELECT
UserID
, [XML] = (
SELECT Gold, Silver, Bronze
FOR XML RAW('t'), TYPE
)
FROM dbo.UserBadges
) p
CROSS APPLY p.[XML].nodes('t/@*') t(c)
In which for each line XML of the form is formed :
Then the name of each attribute and its value is parsed.
In most cases, using XML results in a slower execution plan - this is the price paid for versatility.
Now compare the results: There is
no dramatic difference in execution speed between UNPIVOT and VALUES . This statement is true when it comes to simply converting columns to rows.
We complicate the task and consider another option, where for each user it is necessary to find out the type of medals, which he has the most.
Let's try to solve the problem using the UNPIVOT construct :
SELECT
UserID
, GameType = (
SELECT TOP 1 BadgeType
FROM dbo.UserBadges b2
UNPIVOT (
BadgeCount FOR BadgeType IN (Gold, Silver, Bronze)
) unpvt
WHERE UserID = b.UserID
ORDER BY BadgeCount DESC
)
FROM dbo.UserBadges b
The execution plan shows that the problem is in re-reading data and sorting, which is necessary to organize the data: It
will be easy to get rid of re-reading if you remember that it is allowed to use columns from an external block in a subquery:
SELECT
UserID
, GameType = (
SELECT TOP 1 BadgeType
FROM (SELECT t = 1) t
UNPIVOT (
BadgeCount FOR BadgeType IN (Gold, Silver, Bronze)
) unpvt
ORDER BY BadgeCount DESC
)
FROM dbo.UserBadges
Repeated readings were gone, but the sorting operation did not go away:
Let's see how the VALUES construct behaves in this task:
SELECT
UserID
, GameType = (
SELECT TOP 1 BadgeType
FROM (
VALUES
(Gold, 'Gold')
, (Silver, 'Silver')
, (Bronze, 'Bronze')
) t (BadgeCount, BadgeType)
ORDER BY BadgeCount DESC
)
FROM dbo.UserBadges
The plan was expectedly simplified, but sorting is still present in the plan:
Let's try to bypass the sorting using the aggregate function:
SELECT
UserID
, BadgeType = (
SELECT TOP 1 BadgeType
FROM (
VALUES
(Gold, 'Gold')
, (Silver, 'Silver')
, (Bronze, 'Bronze')
) t (BadgeCount, BadgeType)
WHERE BadgeCount = (
SELECT MAX(Value)
FROM (
VALUES (Gold), (Silver), (Bronze)
) t(Value)
)
)
FROM dbo.UserBadges
We got rid of sorting:
Small results:
In a situation where it is necessary to make a simple conversion of columns to rows, it is most preferable to use UNPIVOT or VALUES constructs .
If after the conversion, the obtained data is planned to be used in aggregation or sorting operations, then it is more preferable to use the VALUES construct , which, in most cases, allows to obtain more efficient execution plans.
If the number of columns in the tables is variable, it is recommended to use XML , which, unlike dynamic SQL, can be used inside table functions.
PSTo adapt, part of the examples to SQL Server 2005 features , the VALUES construct :
SELECT *
FROM (
VALUES (1, 'a'), (2, 'b')
) t(id, value)
must be replaced with the combination SELECT UNION ALL SELECT :
SELECT id = 1, value = 'a'
UNION ALL
SELECT 2, 'b'
UPDATE 10/16/2013 : How do UNPIVOT and VALUES behave on large amounts of data?
Based on a table with the following structure (25 columns total).
CREATE TABLE [dbo].[WorkOutFactors]
(
WorkOutID BIGINT NOT NULL PRIMARY KEY,
NightHours INT NOT NULL,
EveningHours INT NOT NULL,
HolidayHours INT NOT NULL,
...
)
This table contains ~ 186000 rows. From a cold start on local SQL Server 2012 SP1 , the operation of converting rows to columns produced the following results. UNPIVOT
execution plan : VALUES
execution plan : In comparison, it can be seen that VALUES is faster (3 seconds), but requires more CPU resources: I’ll add that in each specific situation the difference in performance will vary.