Don't you like triggers?

    Don't you like cats? Yes, you just do not know how to cook them! (c) Alf
    imageWhen designing sufficiently large relational databases, a decision is often made to deviate from the normal form - “denormalization”.
    The reasons may be different. From an attempt to speed up access to certain data, limitations of the platform / framework / development tools used, and to a lack of qualification of the database developer / designer.
    However, strictly speaking, a link to the limitations of the framework, etc. - essentially an attempt to justify a lack of qualifications.

    Denormalized data is a weak link through which it is easy to bring our database into an inconsistent (non-integral) state.

    What to do about it?

    Example


    The database has a table with some financial transactions: receipt and write-off of funds on different accounts.
    You must always know the balance in your account.

    In normalized data, the balance of funds is always a calculated value. We summarize all receipts minus write-offs.

    However, when the number of operations is very large, it is too expensive to calculate the balance each time.
    Therefore, it was decided to store the actual residuals in a separate table. How to update data in this table?

    The solution "as usual"


    In almost all information systems that I had to work with, this task was performed by an external application that implements business logic. It’s good if the application is uncomplicated and there are one data points - from the form in the user interface. And if there are any imports, APIs, third-party applications, and so on? And these things are done by different people, teams? And if not one table with totals, but several of them in different sections? And if there’s more than one table with operations (also met such)?

    Here to keep track of the fact that the developer when updating the operation did not forget to update a bunch of tables, it becomes more and more difficult. Data is losing integrity. Account balances do not correspond to transactions. Of course, testing should identify such situations. But we do not live in such an ideal world.

    Cats Triggers


    As an alternative, triggers are used to control the integrity of the denormalized data of an “adult” DBMS.
    Often I heard that triggers terribly slow down the database, so their use is impractical.
    The second argument was that all the logic is in a separate application and keeping business logic in different places is also impractical.

    Let's figure it out.

    The brakes

    The trigger fires inside a transaction that modifies data in a table. A transaction cannot be completed if the trigger has not performed the necessary actions. Hence the conclusion: triggers should be “as easy as possible”. An example of a “heavy” request in a trigger:

    update totals 
    set total = select sum(operations.amount) from operations where operations.account = current_account
    where totals.account = current_account

    The request refers to the table of operations ( operations ) and summarizes all the amounts of operations ( amount ) for the account ( account ).

    Such a request with the growth of the database will eat up more and more time and resources. But the same result can be achieved using a "light" query like:

    update totals 
    set total = totals.total + current_amount
    where totals.account = current_account

    Such a trigger when adding a new row will simply increase the total score, without calculating it again, it does not depend on the amount of data in the tables. It does not make sense to re-calculate the result, since we can be sure that the trigger is ALWAYS triggered when a new operation is added.

    Similarly, deleting and changing rows is handled. This type of trigger will not practically slow down the operation, but will guarantee the coherence and integrity of the data.

    Always, when I observed "brakes" when inserting data into a table with a trigger, it was an example of such a "heavy" query. And in the vast majority of cases it was possible to rewrite it in a "light" style.

    Business logic

    Here it is necessary to separate the flies from cutlets. It makes sense to distinguish functions that ensure data integrity from the actual business logic. In each such case, I ask the question: if the data were normalized, would such a function be needed? If the answer is yes, this is business logic. Negative - ensuring data integrity. Feel free to wrap these functions in triggers.

    However, there is an opinion that all business logic can be easily implemented using modern DBMS, such as PostgreSQL or Oracle. I find confirmation in my just-for-fun project .

    I hope this article will help reduce the number of bugs in your information system.

    Of course, I am far from the idea that everything written here is the ultimate truth. In real life, of course, everything is more complicated. Therefore, the decision in each case is up to you. Use your engineering thinking!

    PS


    Thanks for the reasoned pros and cons.
    Following the discussion of the article are a few comments.
    • In the article, the author drew attention to only one aspect of the use of triggers in order to draw attention to the use of such a powerful tool. The topic, of course, is much broader .
    • The approach described in the article may make it possible to abandon the indexes on the operations table , which may speed up the insertion of data into this table. On large volumes, this effect easily compensates for the time spent on the operation of the trigger, not to mention the memory costs for the indices.
    • It is important to understand which tool to use for what, then you will avoid many problems , for example, with statement restart in BEFORE triggers
    • For logs, triggers are generally mastheads

    Also popular now: