AlanDenton December 15, 2015 at 16:36

XML, XQuery, and triple sadness with performance

Tutorial

A trip to Dnepropetrovsk for a meeting with the Dnepr SQL User Group , a chronic lack of sleep over the last couple of days, but a nice bonus on arrival to Kharkov ... Winter weather, which motivates to write something interesting ...

For a long time there were plans to tell about the "pitfalls" when working with XML and XQuery , which can lead to tricky performance issues.

For those who often use SQL Server , XQuery and like to parse values from XML, it is recommended to familiarize yourself with the following material ...

First, we will generate a test XML on which we will conduct experiments:

USE AdventureWorks2012
GO
IF OBJECT_ID('tempdb.dbo.##temp') IS NOT NULL
    DROP TABLE ##temp
GO
SELECT val = (
    SELECT
          [@obj_id] = o.[object_id]
        , [@obj_name] = o.name
        , [@sch_name] = s.name
        , (
            SELECT i.name, i.column_id, i.user_type_id, i.is_nullable, i.is_identity
            FROM sys.all_columns i
            WHERE i.[object_id] = o.[object_id]
            FOR XML AUTO, TYPE
        )
    FROM sys.all_objects o
    JOIN sys.schemas s ON o.[schema_id] = s.[schema_id]
    WHERE o.[type] IN ('U', 'V')
    FOR XML PATH('obj'), ROOT('objects')
)
INTO ##temp
DECLARE @sql NVARCHAR(4000) = 'bcp "SELECT * FROM ##temp" queryout "D:\sample.xml" -S ' + @@servername + ' -T -w -r -t'
EXEC sys.xp_cmdshell @sql
IF OBJECT_ID('tempdb.dbo.##temp') IS NOT NULL
    DROP TABLE ##temp

For those who have xp_cmdshell disabled, you need to do:

EXEC sp_configure 'show advanced options', 1
GO
RECONFIGURE
GO
EXEC sp_configure 'xp_cmdshell', 1
GO
RECONFIGURE
GO

As a result, on the specified path, we will create a file with the following structure:


  ...
  
    ...
  
  ...

Now we’ll start sampling experiments ...

What is the most efficient way to load data from XML ? Probably, you do not need to open the file with notepad, copy the contents and paste into a variable ... I think that it would be more appropriate to use OPENROWSET :

DECLARE @xml XML SELECT @xml = BulkColumn FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x SELECT @xml

But there is a funny catch. As it turned out, the combination of loading operations and parsing values from XML can lead to a significant decrease in performance. Suppose we need to get obj_id values from a previously created file:

;WITH cte AS ( SELECT x = CAST(BulkColumn AS XML) FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x ) SELECT t.c.value('@obj_id', 'INT') FROM cte CROSS APPLY x.nodes('objects/obj') t(c)

On my machine, this request takes a very long time:

(495 row(s) affected) Table 'Worktable'. Scan count 0, logical reads 20788, ..., lob logical reads 7817781, ..., lob read-ahead reads 1022368. SQL Server Execution Times: CPU time = 53688 ms, elapsed time = 53911 ms.

Let's try to separate download and parsing:

DECLARE @xml XML SELECT @xml = BulkColumn FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x SELECT t.c.value('@obj_id', 'INT') FROM @xml.nodes('objects/obj') t(c)

Everything worked out very quickly:

(1 row(s) affected) Table 'Worktable'. Scan count 0, logical reads 7, ..., lob logical reads 2691, ..., lob read-ahead reads 344. SQL Server Execution Times: CPU time = 15 ms, elapsed time = 51 ms. (495 row(s) affected) SQL Server Execution Times: CPU time = 47 ms, elapsed time = 125 ms.

So what was the problem? Let's analyze the execution plan:

As it turned out, the problem lies in type conversion, so try to pass the parameter in the XML type to the nodes function initially . Next, we will consider a typical situation when parsing requires filtering ... In such cases, remember that SQL Server does not optimize function calls for working with XML . For clarity, I will show that in this request, the value function will be executed twice:

SELECT t.c.value('@obj_id', 'INT') FROM @xml.nodes('objects/obj') t(c) WHERE t.c.value('@obj_id', 'INT') < 0

(404 row(s) affected) SQL Server Execution Times: CPU time = 116 ms, elapsed time = 120 ms.

This nuance can reduce performance, so it is recommended to reduce function calls:

SELECT * FROM ( SELECT id = t.c.value('@obj_id', 'INT') FROM @xml.nodes('objects/obj') t(c) ) t WHERE t.id < 0

(404 row(s) affected) SQL Server Execution Times: CPU time = 62 ms, elapsed time = 74 ms.

Alternatively, you can filter it like this:

SELECT t.c.value('@obj_id', 'INT') FROM @xml.nodes('objects/obj[@obj_id < 0]') t(c)

(404 row(s) affected) SQL Server Execution Times: CPU time = 110 ms, elapsed time = 119 ms.

but we can’t talk about substantial gains. Although QueryCost says the opposite:

It is shown that the third option is the most optimal ... Let this be another argument not to trust QueryCost , which is just an internal assessment.

And the most interesting example for a snack ... There is another VERY important feature when parsing from XML . Run the request:

SELECT t.c.value('../@obj_name', 'SYSNAME') , t.c.value('@name', 'SYSNAME') FROM @xml.nodes('objects/obj/*') t(c)

and look at the runtime, which can only suit those who are in no hurry:

(5273 row(s) affected) SQL Server Execution Times: CPU time = 66578 ms, elapsed time = 66714 ms.

Why is this happening? The SQL Server server has problems reading parent nodes from child nodes (to put it simply, SQL Server is hard to “look back”):

How can we be in this case? It's very simple ... start reading from the parent nodes and subtract the children using CROSS / OUTER APPLY :

SELECT t.c.value('@obj_name', 'SYSNAME') , t2.c2.value('@name', 'SYSNAME') FROM @xml.nodes('objects/obj') t(c) CROSS APPLY t.c.nodes('*') t2(c2)

(5273 row(s) affected) SQL Server Execution Times: CPU time = 156 ms, elapsed time = 184 ms.

It is also interesting to consider a situation where we need to look at 2 levels above. The problem with reading the parent element did not reproduce for me:

USE AdventureWorks2012 GO DECLARE @xml XML SELECT @xml = ( SELECT [@obj_name] = o.name , [columns] = ( SELECT i.name FROM sys.all_columns i WHERE i.[object_id] = o.[object_id] FOR XML AUTO, TYPE ) FROM sys.all_objects o WHERE o.[type] IN ('U', 'V') FOR XML PATH('obj') ) SELECT t.c.value('../../@obj_name', 'SYSNAME') , t.c.value('@name', 'SYSNAME') FROM @xml.nodes('obj/columns/*') t(c)

I also wanted to mention one interesting feature. OPENXML has no problems reading parents :

DECLARE @xml XML , @idoc INT SELECT @xml = BulkColumn FROM OPENROWSET(BULK 'D:\sample.xml', SINGLE_BLOB) x EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @xml SELECT * FROM OPENXML(@idoc, '/objects/obj/*') WITH ( name SYSNAME '../@obj_name', col SYSNAME '@name' ) EXEC sys.sp_xml_removedocument @idoc

(5273 row(s) affected) SQL Server Execution Times: CPU time = 47 ms, elapsed time = 137 ms.

But you don’t need to think now that OPENXML has clear advantages over XQuery . In OPENXML too short stocks. For example, if we forget to call sp_xml_removedocument , then severe memory leaks may occur.

Everything was tested on SQL Server 2012 SP3 (11.00.6020) .

If you want to share this article with an English-speaking audience:
XML, XQuery & Performance Issues

Tags:
sql server
xml
xpath
xml parser
openxml
sql

XML, XQuery, and triple sadness with performance

Also popular now: