
7 myths about Linq to Database
Linq appeared in 2007, the first IQueryable provider also appeared - Linq2SQL, it worked only with MS SQL Server, it slowed down quite a bit and covered far from all the scenarios. Almost 7 years have passed, several Linq providers have appeared who work with different DBMSs, they have defeated almost all “childhood diseases” of the technology and, for a couple of years, Linq to Database (a generic name for popular providers) is ready for industrial use.
Nevertheless, not everyone uses Linq to Database, and this is explained not only by the fact that the project is old and it is quite difficult to rewrite to linq, but they also bring various myths as arguments. These myths wander from one company to another and are often spread over the Internet.
In this post I have collected the most popular myths and rebuttals to them.
Despite the attractiveness of the myth, this approach usually does not work. To make effective queries, the DBA must understand very well what is going on in the program, what data is needed in each scenario.
If the DBA does not have such knowledge, then it usually comes down to the fact that the DBA makes a small set of CRUD storage for each entity + several storage for the most "thick" queries. And the rest is already done by the programmers in the code. This most often works inefficiently, because on average much more data is pulled than is necessary for a particular scenario. And it’s difficult to optimize.
If the DBA knows each scenario, then it has two options:
a) Make a lot of storage (almost identical), each for a specific scenario, and then it is painful to support them.
b) Make several universal storages with a bunch of parameters, inside of which glue strings to form optimal queries. Moreover, adding an additional parameter to the request becomes an extremely difficult process.
Both options for DBA are very complex, so most often it turns out a hybrid version with several very complex storage files, and everything else is a commonplace CRUD. Linq allows you to do the same stitching of strings much more efficiently, so you can generate optimal queries in the program code or close to optimal ones.
The DBA can create views and functions that will be used in queries from application code, as well as stored procedures for batch processing. But query design is best left to the application side.
A very often repeated myth. But most of the inefficiency of Linq queries is created by people.
The reasons for this are simple:
1) People do not understand how Linq differs from SQL. Linq works with ordered sequences, and SQL works with disordered sets. Therefore, some Linq operations add extremely inefficient sorting operators to SQL.
2) People do not understand the mechanisms of operation of IQuryable providers and how queries are performed in the DBMS. More in the previous post - habrahabr.ru/post/230479
But there are bugs in providers that lead to the generation of requests that are far from optimal.
For example, in the Entity Framework there is a bug when using navigation properties:
Such a query generates the following SQL:
In this query, the calculated field and sorting by it cannot be optimized by SQL Server and it is necessary to perform real sorting.
But if you slightly rewrite Linq's request to use the join operator, then there will be no problem:
The resulting SQL:
It is well covered by indexes and optimized by SQL Server.
I also heard about inefficient NHibernate requests, but did not work with it so actively to find such bugs.
The conversion of the DataReader to a set of objects is carried out in a fraction of a micro second for each object. Moreover, the linq2db provider manages to do this faster than the advertised Dapper.
But what can work slowly is to attach the received objects to the Change Tracking context. But this must be done only when the objects are changed and written to the database. In other cases, you can explicitly specify that the objects do not join the context or use projections.
Indeed, to generate an SQL query from Linq requires a tree walk, a lot of work with reflection and analysis of metadata. But in all providers, this analysis is performed once, and then the data is cached.
As a result, for simple queries, query generation is performed on average for 0.4ms. For complex ones, this can be up to several milliseconds.
This time is usually less than the statistical error of the total query execution time.
SQL Server has a Plan Guide mechanism that allows you to attach hints to any query. Similar mechanisms exist in other DBMSs.
But even so, hints are not much needed when using Linq. Linq generates fairly simple queries that the DBMS independently optimizes when there are statistics, indexes, and restrictions. It is better to replace lock hints by setting the correct isolation levels and limiting the number of requested rows.
This is partly true. But many SQL features can be wrapped in functions or views, and they are already used in Linq queries.
Moreover, the Entity Framework allows you to execute any SQL queries, and map the results to objects, including with Change Tracking.
This was true in the mid-90s. Today, all DBMSs “compile” requests and cache plans, regardless of whether this is an ad-hoc procedure.
Here is a brief set of myths to come across. If you have more - supplement.
Nevertheless, not everyone uses Linq to Database, and this is explained not only by the fact that the project is old and it is quite difficult to rewrite to linq, but they also bring various myths as arguments. These myths wander from one company to another and are often spread over the Internet.
In this post I have collected the most popular myths and rebuttals to them.
Myth number 1
The database is handled by a specially trained DBA that does all the queries, and programmers write code, so Linq to Database is not needed.
Despite the attractiveness of the myth, this approach usually does not work. To make effective queries, the DBA must understand very well what is going on in the program, what data is needed in each scenario.
If the DBA does not have such knowledge, then it usually comes down to the fact that the DBA makes a small set of CRUD storage for each entity + several storage for the most "thick" queries. And the rest is already done by the programmers in the code. This most often works inefficiently, because on average much more data is pulled than is necessary for a particular scenario. And it’s difficult to optimize.
If the DBA knows each scenario, then it has two options:
a) Make a lot of storage (almost identical), each for a specific scenario, and then it is painful to support them.
b) Make several universal storages with a bunch of parameters, inside of which glue strings to form optimal queries. Moreover, adding an additional parameter to the request becomes an extremely difficult process.
Both options for DBA are very complex, so most often it turns out a hybrid version with several very complex storage files, and everything else is a commonplace CRUD. Linq allows you to do the same stitching of strings much more efficiently, so you can generate optimal queries in the program code or close to optimal ones.
The DBA can create views and functions that will be used in queries from application code, as well as stored procedures for batch processing. But query design is best left to the application side.
Myth number 2
Linq generates inefficient SQL queries.
A very often repeated myth. But most of the inefficiency of Linq queries is created by people.
The reasons for this are simple:
1) People do not understand how Linq differs from SQL. Linq works with ordered sequences, and SQL works with disordered sets. Therefore, some Linq operations add extremely inefficient sorting operators to SQL.
2) People do not understand the mechanisms of operation of IQuryable providers and how queries are performed in the DBMS. More in the previous post - habrahabr.ru/post/230479
But there are bugs in providers that lead to the generation of requests that are far from optimal.
For example, in the Entity Framework there is a bug when using navigation properties:
context.Orders
.Where(o => o.Id == id)
.SelectMany(o => o.OrderLines)
.Select(l => l.Product)
.ToList();
Such a query generates the following SQL:
A lot of code
[Project1].[Id] AS [Id],
[Project1].[OrderDate] AS [OrderDate],
[Project1].[UserId] AS [UserId],
[Project1].[C1] AS [C1],
[Project1].[OrderId] AS [OrderId],
[Project1].[ProductId] AS [ProductId],
[Project1].[Id1] AS [Id1],
[Project1].[Title] AS [Title]
FROM ( SELECT
[Extent1].[Id] AS [Id],
[Extent1].[OrderDate] AS [OrderDate],
[Extent1].[UserId] AS [UserId],
[Join1].[OrderId] AS [OrderId],
[Join1].[ProductId] AS [ProductId],
[Join1].[Id] AS [Id1],
[Join1].[Title] AS [Title],
CASE WHEN ([Join1].[OrderId] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM [dbo].[Orders] AS [Extent1]
LEFT OUTER JOIN (SELECT [Extent2].[OrderId] AS [OrderId], [Extent2].[ProductId] AS [ProductId], [Extent3].[Id] AS [Id], [Extent3].[Title] AS [Title]
FROM [dbo].[OrderLines] AS [Extent2]
INNER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[ProductId] = [Extent3].[Id] ) AS [Join1] ON [Extent1].[Id] = [Join1].[OrderId]
WHERE [Extent1].[Id] = @p__linq__0
) AS [Project1]
ORDER BY [Project1].[Id] ASC, [Project1].[C1] ASC
In this query, the calculated field and sorting by it cannot be optimized by SQL Server and it is necessary to perform real sorting.
But if you slightly rewrite Linq's request to use the join operator, then there will be no problem:
var orders1 = from o in context.Orders
where o.Id == id
join ol in context.OrderLines on o.Id equals ol.OrderId into j
from p in j.DefaultIfEmpty()
select p.Product;
orders1.ToArray();
The resulting SQL:
SELECT
[Extent3].[Id] AS [Id],
[Extent3].[Title] AS [Title]
FROM [dbo].[Orders] AS [Extent1]
LEFT OUTER JOIN [dbo].[OrderLines] AS [Extent2] ON [Extent1].[Id] = [Extent2].[OrderId]
LEFT OUTER JOIN [dbo].[Products] AS [Extent3] ON [Extent2].[ProductId] = [Extent3].[Id]
WHERE [Extent1].[Id] = @p__linq__0
It is well covered by indexes and optimized by SQL Server.
I also heard about inefficient NHibernate requests, but did not work with it so actively to find such bugs.
Myth number 3
Mapping is slow.
The conversion of the DataReader to a set of objects is carried out in a fraction of a micro second for each object. Moreover, the linq2db provider manages to do this faster than the advertised Dapper.
But what can work slowly is to attach the received objects to the Change Tracking context. But this must be done only when the objects are changed and written to the database. In other cases, you can explicitly specify that the objects do not join the context or use projections.
Myth number 4
Queries are being generated slowly.
Indeed, to generate an SQL query from Linq requires a tree walk, a lot of work with reflection and analysis of metadata. But in all providers, this analysis is performed once, and then the data is cached.
As a result, for simple queries, query generation is performed on average for 0.4ms. For complex ones, this can be up to several milliseconds.
This time is usually less than the statistical error of the total query execution time.
Myth number 5
You cannot use hints.
SQL Server has a Plan Guide mechanism that allows you to attach hints to any query. Similar mechanisms exist in other DBMSs.
But even so, hints are not much needed when using Linq. Linq generates fairly simple queries that the DBMS independently optimizes when there are statistics, indexes, and restrictions. It is better to replace lock hints by setting the correct isolation levels and limiting the number of requested rows.
Myth number 6
Linq cannot use all the features of SQL.
This is partly true. But many SQL features can be wrapped in functions or views, and they are already used in Linq queries.
Moreover, the Entity Framework allows you to execute any SQL queries, and map the results to objects, including with Change Tracking.
Myth number 7
Stored procedures run faster than ad-hoc requests generated by Linq.
This was true in the mid-90s. Today, all DBMSs “compile” requests and cache plans, regardless of whether this is an ad-hoc procedure.
Here is a brief set of myths to come across. If you have more - supplement.