xzilgiz August 27, 2014 at 13:29

Database Development Solutions

From the sandbox

Good afternoon! In the process of project development, one has to deal with the problem of database performance, because The amount of data is growing, and problem areas float involuntarily.

This post describes not very successful solutions that are invisible in the early stages of the project. And solutions that can further increase productivity significantly.

The post is more aimed at more experienced developers who are already possibly looking for solutions, so I will be brief.

Select in result fields

Many developers use the following construct when writing queries.

SELECT 
  T.Column1,
  (SELECT MAX(Column) FROM Table2) as Column3,
  T.Column2
FROM Table1 T

The fact is that for each row from the table Table1, a SELECT MAX (Column) FROM Table2 is selected, which requires additional resources for a new selection. In this case, performance will decrease as the amount of data in Table1 and Table2 increases.

Acceptable option.

SELECT 
  T.Column1,
  T2.Column3,
  T.Column2
FROM Table1 T
  JOIN (SELECT Max(Column) as Column3 FROM Table2) T2 ON 1 = 1

In this case, the selection from the table Table2 will be performed once, and the result will be substituted for each row from the table Table1.

Replacing Left Join with Union

There are situations when it is necessary to collect data from several tables of the same type in meaning and often resort to the following solution

SELECT
  M.Id, COALESCE(P1.Sum, 0) + COALESCE(P2.Sum, 0) + COALESCE(P3.Sum, 0) + ... + COALESCE(PN.Sum, 0) as Sum
FROM MainTable M
  LEFT JOIN PayTable1 P1 ON P1.Id = M.Id
  LEFT JOIN PayTable2 P2 ON P2.Id = M.Id
  LEFT JOIN PayTable3 P3 ON P3.Id = M.Id
  .
  .
  .
  LEFT JOIN PayTableN PN ON PN.Id = M.Id

This approach has several problems:
1. Produced, multiplying PayTableN tables and with increasing tables, performance will drop, even if there is little data.
2. The need to use LEFT JOIN, which in many DBMS indexes do not work.

You can improve performance by abandoning the operation of multiplication in favor of combining and using JOIN instead of LEFT JOIN.

SELECT R.Id, SUM(R.Sum) as Sum
FROM
  (
  SELECT M.Id, P1.Sum
  FROM MainTable M
    JOIN PayTable1 P1 ON P1.Id = M.Id
  UNION ALL
  SELECT M.Id, P2.Sum
  FROM MainTable M
    JOIN PayTable2 P2 ON P2.Id = M.Id
  UNION ALL
  SELECT M.Id, P3.Sum
  FROM MainTable M
    JOIN PayTable3 P3 ON P3.Id = M.Id
    .
    .
    .
  UNION ALL
  SELECT M.Id, PN.Sum
  FROM MainTable M
    JOIN PayTableN PN ON PN.Id = M.Id
  ) R
GROUP BY R.Id

Moreover, such a construction is easier to maintain, i.e. when adding new PayTableN tables, you need to add the corresponding block in only one revenge.

Using scalar operations in a join condition

There are situations when in the join condition it is necessary to use additional operations on the field, in which case the index is not used

SELECT
  T1.*, T2.*
FROM Table1 T1
  JOIN Table2 T2 ON T2.Id = T1.Id AND COALESCE(T2.Column, 0) = COALESCE(T1.Column, 0)

Possible solutions:
1. To place in the WHERE clause, but this option will not always give a positive result, because the optimizer could do this for you.

SELECT
  T1.*, T2.*
FROM Table1 T1
  JOIN Table2 T2 ON T2.Id = T1.Id 
WHERE COALESCE(T2.Column, 0) = COALESCE(T1.Column, 0)

2. Divide the request into two.

SELECT
  T1.*, T2.*
FROM Table1 T1
  JOIN Table2 T2 ON T2.Id = T1.Id AND T2.Column = T1.Column
UNION
SELECT
  T1.*, T2.*
FROM Table1 T1
  JOIN Table2 T2 ON T2.Id = T1.Id AND T2.Column IS NULL AND T1.Column IS NULL

Queries Using With

The WITH statement certainly helps to make the request more understandable and structured, but many do not quite understand how it works and make the following mistakes.

WITH
  MainSubQuery AS (
    SELECT * FROM Table1
  ),
  SubQuery AS (
    --Здесь на самом деле, какой либо сложный запрос, требующий больших ресурсов
    SELECT * FROM Table2
  )
  SELECT *
  FROM MainSubQuery M
    LEFT JOIN SubQuery Q1 ON Q1.id = M.id AND Q1.Param = 1
    LEFT JOIN SubQuery Q2 ON Q2.id = M.id AND Q2.Param = 2
    LEFT JOIN SubQuery Q3 ON Q3.id = M.id AND Q3.Param = 3
    LEFT JOIN SubQuery Q4 ON Q4.id = M.id AND Q4.Param = 4

The problem is that many people think that the SubQuery subquery will be executed once, and then only the result will be taken and substituted where necessary.
But in fact, the following is obtained:

SELECT *
FROM (SELECT * FROM Table1) M
  LEFT JOIN (SELECT * FROM Table2) Q1 ON Q1.id = M.id AND Q1.Param = 1
  LEFT JOIN (SELECT * FROM Table2) Q2 ON Q2.id = M.id AND Q2.Param = 2
  LEFT JOIN (SELECT * FROM Table2) Q3 ON Q3.id = M.id AND Q3.Param = 3
  LEFT JOIN (SELECT * FROM Table2) Q4 ON Q4.id = M.id AND Q4.Param = 4

The problem can be solved by resorting to temporary tables, i.e. The result of a heavy query should be preliminarily put in a temporary table and then used. This method is also effective if you use NICKNAME from the IBM DB2 world as the source dataset.

INSERT Session.MainSubQuery
  SELECT * FROM Table1;
INSERT Session.SubQuery
  SELECT * FROM Table2;
SELECT *
FROM Session.MainSubQuery M
  LEFT JOIN Session.SubQuery Q1 ON Q1.id = M.id AND Q1.Param = 1
  LEFT JOIN Session.SubQuery Q2 ON Q2.id = M.id AND Q2.Param = 2
  LEFT JOIN Session.SubQuery Q3 ON Q3.id = M.id AND Q3.Param = 3
  LEFT JOIN Session.SubQuery Q4 ON Q4.id = M.id AND Q4.Param = 4;

Escort

Next, we move from machine productivity to human labor productivity. We will talk about how to facilitate the routine work of maintaining and maintaining databases.

In the process of developing a project, it is necessary to recreate procedures, functions, tables, and other database objects, but
not every DBMS has a CREATE OR REPLACE PROCEDURE [FUNCTION, TABLE, VIEW, TRIGER, ...] construct and
constantly updates DROPs in updates , ALTERs, etc. in each DBMS differently.

The solution to this issue may be the introduction of their own service procedures, for example:

CALL[EXECUTE] DropProcedure(SchemaName, ProcedureName)
CALL[EXECUTE] DropTable(TableName, TableName)
CALL[EXECUTE] DropView(SchemaView, ProcedureView)

etc.

And then we already create the corresponding objects, regardless of whether they were created earlier or not.

But with this approach, there also remains a problem related to access rights to objects, i.e. when deleting and creating an object, group rights to objects fly off, because This is a new object for DBMS.

Decisions can be different, for example:
Maintain the rights issuance script up to date and download after each update (inconvenient).
Do not use Group Policy at all (not good).

A rather convenient and effective solution is to enter a pair of service procedures, where one will be launched at the beginning of the update, and the second at the end.

--Сохранение всех прав
CALL[EXECUTE] StoreRights
--Удаление и создание объектов
--Восстановление прав
CALL[EXECUTE] RestoreRights

The next question relates to the maintenance of the structure of the database DBMS MS SQL, namely in connection with the extremely inconvenient method of commenting on tables and columns. Which in turn is rarely done or not done at all.

Example:

  EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'Идентификатор' , @level0type=N'SCHEMA',@level0name=N'main', @level1type=N'TABLE',@level1name=N'TextTable', @level2type=N'COLUMN',@level2name=N'id';

You can also create a set of service wrapper routines.

  EXEC Service.CommentOnTable N'SchemaName', N'TableName', N'TableComment';
  EXEC Service.CommentOnColumn N'SchemaName', N'TableName', N'ColumnName', N'ColumnComment';
  EXEC Service.CommentOnProcedure N'SchemaName', N'TableName', N'ProcedureComment';

It’s already shorter, more informative and more pleasant to work with.

This, of course, is not limited to a set of service procedures; you can think of a lot of things.

All the best, pleasant work!

Tags: