VladVR May 25, 2015 at 17:40

Entity Framework and performance, second attempt

In my first attempt to close a hole in the performance of the Entity Framework, I considered only materialization. But further in the process of work, as one would expect, I came across another, more significant limitation. The operations of inserting, modifying, and deleting records are also slow. For 100 inserts, EF sends 100 insert requests to the database, without trying to group them.

In addition, in one of the projects, one unpleasant error was discovered: EF version 5.0.0, when working with Oracle, in Clob / Xml fields it is not possible to insert lines of more than 2000 characters.

For the solution, a component was created, which I called Context Items, with the following features:

1) Bulk Insert (MS Sql): in tables that do not have Identity as the primary key, it is possible to insert using the Bulk Insert method, which is supported by the Ms Sql Server database. In the case of Identity, there is no way to reliably get back the keys generated by the database when inserting using Bulk Insert, therefore, for tables that have such keys, a grouping of several ordinary Insert queries into one query is used. This works significantly slower than Bulk Insert, but still faster than through EF.

2) Sequenced Bulk Insert (MS Sql): Guid is usually an alternative to Identity, this solves the problem of insertion, but creates another problem - due to the longer key length, Join operations start to work more slowly, except for this, Guid is inconsistent, and therefore Clustered indexes do not bring their advantages . As a solution to this problem, starting with MS Sql Server 2012, it is possible to use Sequence to create primary keys. This allows you to use integer serial keys, which allows you to use Clustered indexes, similar to Identity, and at the same time allows you to use Bulk Insert for insertion. The component only supports acyclic Sequence with increment 1.

3) Bulk Update (MS Sql): by itself, such an operation does not exist in the database, the component implements it by sequentially performing the following 4 operations:

   a) A temporary table is created that has the same set of fields as the target table
   b) Bulk Insert data into the temporary table.
   c) The Join-Update operation is performed, which transfers data from the records of the temporary table to the records of the target table, 
      matching primary keys.
   d) The temporary table is deleted

Due to the fact that the operation is not atomic, it is advisable to execute it in a transaction.

4) Bulk Delete (MS Sql): just like Bulk Update, this operation takes place in 4 steps:

   a) A temporary table is created with a set of fields matching the primary key of the target table
   b) Bulk Insert into the temporary primary key table for entries to be deleted.
   c) Join-Delete operation is performed
   d) The temporary table is deleted

5) Materialization: The function migrated from the previous version, in addition I added a test project to the repository, which includes a comparison of materialization performance with micro-ORM Dapper. Context Items performs this operation about 3-5% faster than Dapper, and 40% faster than EF when using AsNoTracking, without this, EF works several times slower.

6) Array-bound Insert, Update and Delete (Oracle): Bulk Insert component implemented in ODP.Net does not take into account triggers or constraints, or even primary keys. Therefore, it is not suitable for inserting directly into the table. In addition, it does not support transactions. Of course, you could try to use a temporary table as in the case of MSSql, but I decided to use a method that is recommended by Oracle itself. The method is called array-binding. In short, the request sent to the database looks like we are inserting a single record, but we are not passing a set of fields as parameters, but a set of field arrays, and thus inserting, updating or deleting an array of records, not just one record. You can read about the method here.

7) Equality members: GetHashCode and Equals methods working with the primary key are generated for all entities.

The Context Items component supports EF versions 5.0.0 - 6.1.3, as well as Oracle and MS Sql Server databases.
Only the database-first approach is supported, there is a limitation - entity names must match table names. Hands did not reach to fix this restriction.

The component is published on nuget.org, with the identifiers contextitemsmssql and contextitemsoracle. You can read more about installation and use in the repository on GitHub.com: github.com/repinvv/ContextItems

Tags:

entity framework

Entity Framework and performance, second attempt

Also popular now: