
String aggregation in the SQL Server world
- Tutorial
In practice, the tasks of combining strings into one come across quite often. It is very sad, but the T-SQL standard does not provide for the possibility of using string data inside the SUM aggregation function :
Msg 8117, Level 16, State 1, Line 1
Operand data type char is invalid for sum operator.
Although the solution of such problems for MySQL function has been added GROUP_CONCAT , and the Oracle - LISTAGG . In turn, SQL Server does not yet have such built-in functionality.
However, you should not consider this as a drawback, since the capabilities of T-SQLallow string concatenation to be performed more flexibly and efficiently through the use of other constructs, which will be discussed later.
Suppose that we need to combine several lines into a single line, using the data from the following table:
The most obvious solution to this problem is to use the cursor:
However, its use reduces the efficiency of query execution and, at least, does not look too elegant.
To get rid of it, you can concatenate strings by assigning variables:
On the one hand, the design turned out to be very simple, on the other hand, its performance on a large sample leaves much to be desired.
For row aggregation, it is also possible to do it through XML using the following construction:
If you look at the plan, you will notice the high cost of calling the value method :

To get rid of this operation, you can rewrite the query using the XQuery properties :
As a result, we get a very simple and quick execution plan:

In principle, concatenating rows in a single column does not cause much difficulty.
A more interesting situation is when you want to perform concatenation on multiple columns at once. For example, we have the following table:
in which it is necessary to group the data as follows:

Alternatively, you can copy the XML calls , but then we get duplicate reads that can significantly affect the efficiency of the query:
You can easily see this if you look at the execution plan:

To reduce repeated readings, you can use a small XML hack:
But this query will also not be optimal due to repeated calling of the query method .
You can use the cursor:
However, as practice has shown, when working with large ETL packages, the most productive solution is the ability to assign variables in the UPDATE construct :
SQL Server does not have a built-in analogue of the GROUP_CONCAT and LISTAGG functions . However, this does not interfere, depending on the situation, to efficiently perform string concatenation tasks. The purpose of this post is to demonstrate this.
Msg 8117, Level 16, State 1, Line 1
Operand data type char is invalid for sum operator.
Although the solution of such problems for MySQL function has been added GROUP_CONCAT , and the Oracle - LISTAGG . In turn, SQL Server does not yet have such built-in functionality.
However, you should not consider this as a drawback, since the capabilities of T-SQLallow string concatenation to be performed more flexibly and efficiently through the use of other constructs, which will be discussed later.
Suppose that we need to combine several lines into a single line, using the data from the following table:
IF OBJECT_ID('dbo.Chars', 'U') IS NOT NULL
DROP TABLE dbo.Chars
GO
CREATE TABLE dbo.Chars ([Char] CHAR(1) PRIMARY KEY)
INSERT INTO dbo.Chars ([Char])
VALUES ('A'), ('B'), ('C'), ('F'), ('D')
The most obvious solution to this problem is to use the cursor:
DECLARE
@Chars VARCHAR(100)
, @Char CHAR(1)
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR
SELECT [Char]
FROM dbo.Chars
OPEN cur
FETCH NEXT FROM cur INTO @Char
WHILE @@FETCH_STATUS = 0 BEGIN
SET @Chars = ISNULL(@Chars + ', ' + @Char, @Char)
FETCH NEXT FROM cur INTO @Char
END
CLOSE cur
DEALLOCATE cur
SELECT @Chars
However, its use reduces the efficiency of query execution and, at least, does not look too elegant.
To get rid of it, you can concatenate strings by assigning variables:
DECLARE @Chars VARCHAR(100)
SELECT @Chars = ISNULL(@Chars + ', ' + [Char], [Char])
FROM dbo.Chars
SELECT @Chars
On the one hand, the design turned out to be very simple, on the other hand, its performance on a large sample leaves much to be desired.
For row aggregation, it is also possible to do it through XML using the following construction:
SELECT Chars = STUFF((
SELECT ', ' + [Char]
FROM dbo.Chars
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)'), 1, 2, '')
If you look at the plan, you will notice the high cost of calling the value method :

To get rid of this operation, you can rewrite the query using the XQuery properties :
SELECT Chars = STUFF(CAST((
SELECT [text()] = ', ' + [Char]
FROM dbo.Chars
FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '')
As a result, we get a very simple and quick execution plan:

In principle, concatenating rows in a single column does not cause much difficulty.
A more interesting situation is when you want to perform concatenation on multiple columns at once. For example, we have the following table:
IF OBJECT_ID('dbo.EntityValues', 'U') IS NOT NULL
DROP TABLE dbo.EntityValues
GO
CREATE TABLE dbo.EntityValues (
EntityID INT
, Value1 CHAR(1)
, Value2 CHAR(1)
)
CREATE NONCLUSTERED INDEX IX_WorkOut_EntityID
ON dbo.EntityValues (EntityID)
GO
INSERT INTO dbo.EntityValues (EntityID, Value1, Value2)
VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (2, 'C', 'Z'), (2, 'F', 'H'), (1, 'D', 'R')
in which it is necessary to group the data as follows:

Alternatively, you can copy the XML calls , but then we get duplicate reads that can significantly affect the efficiency of the query:
SELECT
ev.EntityID
, Values1 = STUFF(CAST((
SELECT [text()] = ', ' + ev2.Value1
FROM dbo.EntityValues ev2
WHERE ev2.EntityID = ev.EntityID
FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '')
, Values2 = STUFF(CAST((
SELECT [text()] = ', ' + ev2.Value2
FROM dbo.EntityValues ev2
WHERE ev2.EntityID = ev.EntityID
FOR XML PATH(''), TYPE) AS VARCHAR(100)), 1, 2, '')
FROM (
SELECT DISTINCT EntityID
FROM dbo.EntityValues
) ev
You can easily see this if you look at the execution plan:

To reduce repeated readings, you can use a small XML hack:
SELECT
ev.EntityID
, Values1 = STUFF(REPLACE(
CAST([XML].query('for $a in /a return xs:string($a)') AS VARCHAR(100)), ' ,', ','), 1, 1, '')
, Values2 = STUFF(REPLACE(
CAST([XML].query('for $b in /b return xs:string($b)') AS VARCHAR(100)), ' ,', ','), 1, 1, '')
FROM (
SELECT DISTINCT EntityID
FROM dbo.EntityValues
) ev
CROSS APPLY (
SELECT [XML] = CAST((
SELECT
[a] = ', ' + ev2.Value1
, [b] = ', ' + ev2.Value2
FROM dbo.EntityValues ev2
WHERE ev2.EntityID = ev.EntityID
FOR XML PATH('')
) AS XML)
) t
But this query will also not be optimal due to repeated calling of the query method .
You can use the cursor:
IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
DROP TABLE #EntityValues
GO
SELECT DISTINCT
EntityID
, Values1 = CAST(NULL AS VARCHAR(100))
, Values2 = CAST(NULL AS VARCHAR(100))
INTO #EntityValues
FROM dbo.EntityValues
DECLARE
@EntityID INT
, @Value1 CHAR(1)
, @Value2 CHAR(1)
DECLARE cur CURSOR LOCAL READ_ONLY FAST_FORWARD FOR
SELECT
EntityID
, Value1
, Value2
FROM dbo.EntityValues
OPEN cur
FETCH NEXT FROM cur INTO
@EntityID
, @Value1
, @Value2
WHILE @@FETCH_STATUS = 0 BEGIN
UPDATE #EntityValues
SET
Values1 = ISNULL(Values1 + ', ' + @Value1, @Value1)
, Values2 = ISNULL(Values2 + ', ' + @Value2, @Value2)
WHERE EntityID = @EntityID
FETCH NEXT FROM cur INTO
@EntityID
, @Value1
, @Value2
END
CLOSE cur
DEALLOCATE cur
SELECT *
FROM #EntityValues
However, as practice has shown, when working with large ETL packages, the most productive solution is the ability to assign variables in the UPDATE construct :
IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
DROP TABLE #EntityValues
GO
DECLARE
@Values1 VARCHAR(100)
, @Values2 VARCHAR(100)
SELECT
EntityID
, Value1
, Value2
, RowNum = ROW_NUMBER() OVER (PARTITION BY EntityID ORDER BY 1/0)
, Values1 = CAST(NULL AS VARCHAR(100))
, Values2 = CAST(NULL AS VARCHAR(100))
INTO #EntityValues
FROM dbo.EntityValues
UPDATE #EntityValues
SET
@Values1 = Values1 =
CASE WHEN RowNum = 1
THEN Value1
ELSE @Values1 + ', ' + Value1
END
, @Values2 = Values2 =
CASE WHEN RowNum = 1
THEN Value2
ELSE @Values2 + ', ' + Value2
END
SELECT
EntityID
, Values1 = MAX(Values1)
, Values2 = MAX(Values2)
FROM #EntityValues
GROUP BY EntityID
SQL Server does not have a built-in analogue of the GROUP_CONCAT and LISTAGG functions . However, this does not interfere, depending on the situation, to efficiently perform string concatenation tasks. The purpose of this post is to demonstrate this.