AlanDenton October 15, 2013 at 16:10

Unpivot

Tutorial

During my work, I was faced with a wide range of tasks. Some tasks required monotonous work, others came down to pure creativity.

The most interesting tasks that I can now recall, one way or another, touched upon questions of query optimization.

Optimization is, first of all, the search for the optimal query plan. However, what to do in a situation where the standard construction of the language produces a plan that is very far from optimal?

I encountered this kind of problem when I used the UNPIVOT construct to convert columns to rows.

Through a small comparative analysis, a more effective alternative was found for UNPIVOT .

So that the task does not seem abstract, suppose that we have at our disposal a table containing information on the number of medals among users.

IF OBJECT_ID('dbo.UserBadges', 'U') IS NOT NULL
	DROP TABLE dbo.UserBadges
GO
CREATE TABLE dbo.UserBadges (
	  UserID INT
	, Gold SMALLINT NOT NULL
	, Silver SMALLINT NOT NULL
	, Bronze SMALLINT NOT NULL
	, CONSTRAINT PK_UserBadges PRIMARY KEY (UserID)
)
INSERT INTO dbo.UserBadges (UserID, Gold, Silver, Bronze)
VALUES
	(1, 5, 3, 1),
	(2, 0, 8, 1),
	(3, 2, 4, 11)

Based on this table, we present various methods for converting columns to rows, as well as plans for their implementation.

1. UNION ALL

At one time, SQL Server 2000 did not provide an efficient way to convert columns to rows. As a result, the practice of multiple sampling from the same table, but with a different set of columns combined through the UNION ALL construct, was widely practiced :

SELECT UserID, BadgeCount = Gold, BadgeType = 'Gold' 
FROM dbo.UserBadges
    UNION ALL
SELECT UserID, Silver, 'Silver' 
FROM dbo.UserBadges
    UNION ALL
SELECT UserID, Bronze, 'Bronze' 
FROM dbo.UserBadges

A huge minus of this approach is the repeated reading of the data, which significantly reduced the efficiency of such a request.

If you look at the implementation plan, you can easily see this:

2. UNPIVOT

With the release of SQL Server 2005 , it became possible to use the new T-SQL language construct - UNPIVOT .

Using UNPIVOT, the previous query can be simplified to:

SELECT UserID, BadgeCount, BadgeType
FROM dbo.UserBadges
UNPIVOT (
    BadgeCount FOR BadgeType IN (Gold, Silver, Bronze)
) unpvt

When executed, we get the following plan:

3. VALUES

Starting with SQL Server 2008, it became possible to use the VALUES construct not only to create multi-line INSERT queries, but also inside the FROM block .

Using the VALUES construct , the query above can be rewritten as follows:

SELECT p.UserID, t.*
FROM dbo.UserBadges p
CROSS APPLY (
    VALUES 
          (Gold,   'Gold')
        , (Silver, 'Silver')
        , (Bronze, 'Bronze')
) t(BadgeCount, BadgeType)

At the same time, in comparison with UNPIVOT , the implementation plan will be slightly simplified:

4. Dynamic SQL

Using dynamic SQL, it is possible to create a “universal” query for any table. The main condition for this is that columns that are not part of the primary key must have compatible data types.

To find out a list of such columns, use the following query:

SELECT c.name
FROM sys.columns c WITH(NOLOCK)
LEFT JOIN (
	SELECT i.[object_id], i.column_id
	FROM sys.index_columns i WITH(NOLOCK)
	WHERE i.index_id = 1
) i ON c.[object_id] = i.[object_id] AND c.column_id = i.column_id
WHERE c.[object_id] = OBJECT_ID('dbo.UserBadges', 'U')
	AND i.[object_id] IS NULL

If you look at the query plan, you will notice that connecting to sys.index_columns is quite expensive: You

can use the INDEX_COL function to get rid of this connection . As a result, the final version of the request will take the following form:

DECLARE @table_name SYSNAME
SELECT @table_name = 'dbo.UserBadges'
DECLARE @SQL NVARCHAR(MAX)
SELECT @SQL = '
SELECT * 
FROM ' + @table_name + '
UNPIVOT (
    value FOR code IN (
        ' + STUFF((
    SELECT ', [' + c.name + ']'
    FROM sys.columns c WITH(NOLOCK)
    WHERE c.[object_id] = OBJECT_ID(@table_name)
				AND INDEX_COL(@table_name, 1, c.column_id) IS NULL
    FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)'), 1, 2, '') + '
    )
) unpiv'
PRINT @SQL
EXEC sys.sp_executesql @SQL

When executed, a request will be generated in accordance with the template:

SELECT * 
FROM 
UNPIVOT (
	value FOR code IN ()
) unpiv

Even if we take into account the optimizations that we have done, it is worth noting that this method is still slower in comparison with the two previous ones:

5. XML

A more elegant way to implement dynamic UNPIVOT is to use the following XML trick :

SELECT
      p.UserID
    , BadgeCount = t.c.value('.', 'INT') 
    , BadgeType = t.c.value('local-name(.)', 'VARCHAR(10)') 
FROM (
    SELECT 
          UserID
        , [XML] = (
                SELECT Gold, Silver, Bronze
                FOR XML RAW('t'), TYPE
            )
    FROM dbo.UserBadges
) p
CROSS APPLY p.[XML].nodes('t/@*') t(c)

In which for each line XML of the form is formed :

Then the name of each attribute and its value is parsed.

In most cases, using XML results in a slower execution plan - this is the price paid for versatility.

Now compare the results: There is

no dramatic difference in execution speed between UNPIVOT and VALUES . This statement is true when it comes to simply converting columns to rows.

We complicate the task and consider another option, where for each user it is necessary to find out the type of medals, which he has the most.

Let's try to solve the problem using the UNPIVOT construct :

SELECT 
      UserID
    , GameType = (
        SELECT TOP 1 BadgeType 
        FROM dbo.UserBadges b2 
        UNPIVOT (
            BadgeCount FOR BadgeType IN (Gold, Silver, Bronze)
        ) unpvt
        WHERE UserID = b.UserID 
        ORDER BY BadgeCount DESC
    ) 
FROM dbo.UserBadges b

The execution plan shows that the problem is in re-reading data and sorting, which is necessary to organize the data: It

will be easy to get rid of re-reading if you remember that it is allowed to use columns from an external block in a subquery:

SELECT 
	  UserID
	, GameType = (
		SELECT TOP 1 BadgeType
		FROM (SELECT t = 1) t 
		UNPIVOT (
			BadgeCount FOR BadgeType IN (Gold, Silver, Bronze)
		) unpvt
		ORDER BY BadgeCount DESC
	) 
FROM dbo.UserBadges

Repeated readings were gone, but the sorting operation did not go away:

Let's see how the VALUES construct behaves in this task:

SELECT 
      UserID
    , GameType = (
            SELECT TOP 1 BadgeType
            FROM (
                VALUES
                      (Gold,   'Gold')
                    , (Silver, 'Silver')
                    , (Bronze, 'Bronze')
            ) t (BadgeCount, BadgeType)
            ORDER BY BadgeCount DESC
        ) 
FROM dbo.UserBadges

The plan was expectedly simplified, but sorting is still present in the plan:

Let's try to bypass the sorting using the aggregate function:

SELECT 
	  UserID
	, BadgeType = (
            SELECT TOP 1 BadgeType
            FROM (
                VALUES
                      (Gold,   'Gold')
                    , (Silver, 'Silver')
                    , (Bronze, 'Bronze')
            ) t (BadgeCount, BadgeType)
			WHERE BadgeCount = (
				SELECT MAX(Value)
				FROM (
					VALUES (Gold), (Silver), (Bronze)
				) t(Value)
			)
		) 
FROM dbo.UserBadges

We got rid of sorting:

Small results:

In a situation where it is necessary to make a simple conversion of columns to rows, it is most preferable to use UNPIVOT or VALUES constructs .

If after the conversion, the obtained data is planned to be used in aggregation or sorting operations, then it is more preferable to use the VALUES construct , which, in most cases, allows to obtain more efficient execution plans.

If the number of columns in the tables is variable, it is recommended to use XML , which, unlike dynamic SQL, can be used inside table functions.

PSTo adapt, part of the examples to SQL Server 2005 features , the VALUES construct :

SELECT * 
FROM (
	VALUES (1, 'a'), (2, 'b')
) t(id, value)

must be replaced with the combination SELECT UNION ALL SELECT :

SELECT id = 1, value = 'a'
UNION ALL
SELECT 2, 'b'

UPDATE 10/16/2013 : How do UNPIVOT and VALUES behave on large amounts of data?

Based on a table with the following structure (25 columns total).

CREATE TABLE [dbo].[WorkOutFactors]
(
	WorkOutID BIGINT NOT NULL PRIMARY KEY,
	NightHours INT NOT NULL,
	EveningHours INT NOT NULL,
	HolidayHours INT NOT NULL,
	...
)

This table contains ~ 186000 rows. From a cold start on local SQL Server 2012 SP1 , the operation of converting rows to columns produced the following results. UNPIVOT

execution plan : VALUES

execution plan : In comparison, it can be seen that VALUES is faster (3 seconds), but requires more CPU resources: I’ll add that in each specific situation the difference in performance will vary.

Tags: