phoinixrw July 2, 2009 at 14:03

Nested Sets + PostgreSQL TRIGGER

Task

How convenient it is to make selections from trees such as Nested Sets, and how not convenient to manage it. It’s convenient to manage trees like id-> parent_id, but it’s not convenient and expensive to use recursions for selections. It is clear that when using modules for managing trees, part of the problem is removed, but the process of working with the database is not completely transparent i.e. to change the data we use some methods, to change the location of the node in the tree - others, plus transactions would not hurt. This discrepancy can be solved in two ways:

Use stored procedures to work with the table, in which combine both update methods (insert, delete);
Use triggers to exclude any non-standard working methods;

The first method is inconvenient in that when you change the structure of the table, we will also need to change the procedure, as well as be as careful as possible when working with the table so that all changes to the data go through our procedures, and not by direct requests. The second method makes the table somewhat more difficult by introducing additional Boolean fields, and you also have to do some “feint with your ears”, although it allows you to achieve maximum transparency of the work. The first method is to the furnace, especially somewhere on the Internet there is already a similar solution. The database is PostgreSQL, as relevant to me at the moment, add-ons for MySQL will write later.

Table

So, what triggers will we need:

Before inserting a record, to create a gap in the tree and keys for the created node;
Before updating - to rebuild the tree and generate keys for the updated node;
After removal - removal of the gap in the tree remaining after the removal of the node;

Rake:

For the duration of the triggers, you need to lock the table, or a separate tree, if you carried several trees in the same table;
In PostgreSQL and MySQL, you cannot disable recursion in triggers, like this;

The second point is more detailed: In a trigger before updating, records from the same table can be updated, which will entail a repeated call of the trigger, and so on, as well for a trigger called after deletion. In order to understand whether the request from the trigger is called from us or not, we introduce two additional Boolean fields, which we will pass in the request as a parameter (flag) for the trigger, and not as data. Why two? I’ll tell you later. We will form the structure of the table immediately, taking into account the fact that several trees will be stored in it. I will explain why. It is ridiculous to tears to listen to stupid developers who prove with foam at the mouth that they say, ah-ah-ah, with a large number of nodes, updating nodes can affect the whole tree, and this is so hard for the database. Yes exactly. I do not argue.

I do not use a common root node;
I divide the trees into nodes of the zero level;

Example: There is some forum. The forum section has 1'000 posts, each post has 1'000 comments. Total comments - 1'000'000. So, the forum section is NOT the root node of comments, just like posts are NOT nodes of the same tree, they are only tree separators. As a result, I have 1'000 separate trees with 1'000 comments each. Updates take place only within a maximum of 1,000 entries. In some cases, if there is a lot of this, tree separators are first-level comments. As a result, rebuilding a tree is not such a load on the base. Learn the math part. Let's not talk about the sad, table structure: SQL code (1)

CREATE
  ns_tree (
    id SERIAL,
    left_key INTEGER NOT NULL,
    right_key INTEGER NOT NULL,
    level INTEGER NOT NULL DEFAULT 0,
    tree INTEGER NOT NULL,    
    parent_id INTEGER NOT NULL DEFAULT 0,
    _trigger_lock_update BOOLEAN NOT NULL DEFAULT FALSE,
    _trigger_for_delete BOOLEAN NOT NULL DEFAULT FALSE,
    field_1 ...,
    ...
PRIMARY KEY (id)
);

Actually - _trigger_lock_update and _trigger_for_delete , are our auxiliary fields. We will immediately make the function block the tree to change until the transaction is completed: SQL code (2)

CREATE OR REPLACE FUNCTION lock_ns_tree (integer)
    RETURNS boolean AS
$ BODY $
DECLARE tree_id ALIAS FOR $ 1;
    _id INTEGER;
BEGIN
    SELECT id
        INTO _id
        FROM ns_tree
        WHERE tree = tree_id FOR UPDATE;
    RETURN TRUE;
END
$ BODY $
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;
ALTER FUNCTION lock_ns_tree (integer) OWNER TO user;

Create Record

We have 3 options for adding a node to the tree:

Adding subordination to a specific node, then we pass parent_id ;
Adding to a specific point in the tree, then we pass left_key ;
Adding to the end of the tree, then we do not need to transfer anything extra;

In the same sequence, we will determine the destination of the created node. SQL code (3)

CREATE OR REPLACE FUNCTION ns_tree_before_insert_func ()
    RETURNS trigger AS
$ BODY $
Decare
    _left_key INTEGER;
    _level INTEGER;
    _tmp_left_key INTEGER;
    _tmp_right_key INTEGER;
    _tmp_level INTEGER;
    _tmp_id INTEGER;
    _tmp_parent_id INTEGER;
BEGIN
    PERFORM lock_ns_tree (NEW.tree);
- You can’t set these fields with pens:
    NEW._trigger_for_delete: = FALSE;
    NEW._trigger_lock_update: = FALSE;
    _left_key: = 0;
    _level: = 0;
- If we indicated the parent:
    IF NEW.parent_id IS NOT NULL AND NEW.parent_id> 0 THEN
        SELECT right_key, "level" + 1
            INTO _left_key, _level
            FROM ns_tree
            WHERE id = NEW.parent_id AND
                  tree = NEW.tree;
    END IF;
- If we specified the left key:
    IF NEW.left_key IS NOT NULL AND
       NEW.left_key> 0 AND 
       (_left_key IS NULL OR _left_key = 0) THEN
        SELECT id, left_key, right_key, "level", parent_id 
            INTO _tmp_id, _tmp_left_key, _tmp_right_key, _tmp_level, _tmp_parent_id
            FROM ns_tree
            WHERE tree = NEW.tree AND (left_key = NEW.left_key OR right_key = NEW.left_key);
        IF _tmp_left_key IS NOT NULL AND _tmp_left_key> 0 AND NEW.left_key = _tmp_left_key THEN
            NEW.parent_id: = _tmp_parent_id;
            _left_key: = NEW.left_key;
            _level: = _tmp_level;
        ELSIF _tmp_left_key IS NOT NULL AND _tmp_left_key> 0 AND NEW.left_key = _tmp_right_key THEN
            NEW.parent_id: = _tmp_id;
            _left_key: = NEW.left_key;
            _level: = _tmp_level + 1;
        END IF;
    END IF;
- If the parent or left key is not specified, or we did not find anything:
    IF _left_key IS NULL OR _left_key = 0 THEN
        SELECT MAX (right_key) + 1
            INTO _left_key
            FROM ns_tree
            WHERE tree = NEW.tree;
        IF _left_key IS NULL OR _left_key = 0 THEN
            _left_key: = 1;
        END IF;
        _level: = 0;
        NEW.parent_id: = 0; 
    END IF;
- Set the received keys for the node:
    NEW.left_key: = _left_key;
    NEW.right_key: = _left_key + 1;
    NEW. "Level": = _level;
- We form a break in the tree at the insertion point:
    UPDATE ns_tree
        SET left_key = left_key + 
            CASE WHEN left_key> = _left_key 
              THEN 2 
              ELSE 0 
            END,
            right_key = right_key + 2,
            _trigger_lock_update = TRUE
        WHERE tree = NEW.tree AND right_key> = _left_key;
    RETURN NEW;
END
$ BODY $
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;
ALTER FUNCTION ns_tree_before_insert_func () OWNER TO user;
CREATE TRIGGER ns_tree_before_insert_tr
    BEFORE INSERT
    ON ns_tree
    FOR EACH ROW
    EXECUTE PROCEDURE ns_tree_before_insert_func ();

Now some clarifications:

_trigger_lock_update and _trigger_for_delete are control fields, so we immediately reset them regardless of the user's wishes;
Even if we specified parent_id , it’s not a fact that we have such a node and that it is in the corresponding tree. In the current case, if I cannot find a node in this tree, then parent_id is reset, and the node is inserted at the end of the tree. Alternatively, you can filter by tree, but just search for a node by id , then you will need to update the tree field of the created node. It all depends on the priority of the fields and the specific implementation;
If we specified the left key, then at least we need to calculate the parent of the created node, we determine the parent according to the principle: if we found the node using the left key, then the parent will be the same as the found node, otherwise if it is right, then the parent will be the node we found, we also build the level. If the node is not found, then insert at the end of the tree, left_key is then reset;
During the formation of the gap, the _trigger_lock_update field is additionally transferred - just so that the trigger for UPDATE knows that this update is connected exclusively with the structure of the tree, and no additional calculations and changes in the structure are required, since the final key values are already transmitted;

Edit Record

More precisely, this trigger will work exclusively with the structure of the tree, and not with mutable data. The main parameters forcing it to take some action are parent_id or left_key (remembering, of course, _trigger_lock_update as a control parameter for the trigger). The algorithm is simple: first, the coordinates of movement, then rebuild the tree. SQL code (4)

CREATE OR REPLACE FUNCTION ns_tree_before_update_func ()
  RETURNS trigger AS
$ BODY $
Decare
    _left_key INTEGER;
    _level INTEGER;
    _skew_tree INTEGER;
    _skew_level INTEGER;
    _skew_edit INTEGER;
    _tmp_left_key INTEGER;
    _tmp_right_key INTEGER;
    _tmp_level INTEGER;
    _tmp_id INTEGER;
    _tmp_parent_id INTEGER;
BEGIN
    PERFORM lock_ns_tree (OLD.tree);
- Is it worth it to do anything at all:
    IF NEW._trigger_lock_update = TRUE THEN
        NEW._trigger_lock_update: = FALSE;
        IF NEW._trigger_for_delete = TRUE THEN
            NEW = OLD;
            NEW._trigger_for_delete = TRUE;
            RETURN NEW;
        END IF;
        RETURN NEW;
    END IF;
- We reset the values of the fields that the user cannot change:
    NEW._trigger_for_delete: = FALSE;
    NEW.tree: = OLD.tree;
    NEW.right_key: = OLD.right_key;
    NEW. "Level": = OLD. "Level";
    IF NEW.parent_id IS NULL THEN NEW.parent_id: = 0; END IF;
- Check if there are changes related to the structure of the tree
    IF NEW.parent_id = OLD.parent_id AND NEW.left_key = OLD.left_key
    THEN
        RETURN NEW;
    END IF;
- We are rebuilding the tree, well, let's proceed:
    _left_key: = 0;
    _level: = 0;
    _skew_tree: = OLD.right_key - OLD.left_key + 1;
- Determine where we are moving it:
- If parent_id is changed:
    IF NEW.parent_id <> OLD.parent_id THEN
- If in subjection to another evil:
        IF NEW.parent_id> 0 THEN
            SELECT right_key, level + 1
                INTO _left_key, _level
                FROM ns_tree
                WHERE id = NEW.parent_id AND tree = NEW.tree;
- Otherwise, we transfer to the root of the tree:
        ELSE
            SELECT MAX (right_key) + 1 
                INTO _left_key
                FROM ns_tree
                WHERE tree = NEW.tree;
            _level: = 0;
        END IF;
- If suddenly the parent is in the range of the moved node, check:
        IF _left_key IS NOT NULL AND 
           _left_key> 0 AND
           _left_key> OLD.left_key AND
           _left_key <= OLD.right_key 
        THEN
           NEW.parent_id: = OLD.parent_id;
           NEW.left_key: = OLD.left_key;
           RETURN NEW;
        END IF;
    END IF;
- If left_key is specified, and parent_id is not
    IF _left_key IS NULL OR _left_key = 0 THEN
        SELECT id, left_key, right_key, "level", parent_id 
            INTO _tmp_id, _tmp_left_key, _tmp_right_key, _tmp_level, _tmp_parent_id
            FROM ns_tree
            WHERE tree = NEW.tree AND (right_key = NEW.left_key OR right_key = NEW.left_key - 1)
            LIMIT 1;
        IF _tmp_left_key IS NOT NULL AND _tmp_left_key> 0 AND NEW.left_key - 1 = _tmp_right_key THEN
            NEW.parent_id: = _tmp_parent_id;
            _left_key: = NEW.left_key;
            _level: = _tmp_level;
        ELSIF _tmp_left_key IS NOT NULL AND _tmp_left_key> 0 AND NEW.left_key = _tmp_right_key THEN
            NEW.parent_id: = _tmp_id;
            _left_key: = NEW.left_key;
            _level: = _tmp_level + 1;
        ELSIF NEW.left_key = 1 THEN
            NEW.parent_id: = 0;
            _left_key: = NEW.left_key;
            _level: = 0;
        ELSE
           NEW.parent_id: = OLD.parent_id;
           NEW.left_key: = OLD.left_key;
           RETURN NEW;
        END IF;
    END IF;
- Now we know where we are moving the tree
        _skew_level: = _level - OLD. "level";
    IF _left_key> OLD.left_key THEN
- Move up the tree
        _skew_edit: = _left_key - OLD.left_key - _skew_tree;
        UPDATE ns_tree
            SET left_key = CASE WHEN right_key <= OLD.right_key
                                 THEN left_key + _skew_edit
                                 ELSE CASE WHEN left_key> OLD.right_key
                                           THEN left_key - _skew_tree
                                           ELSE left_key
                                      End
                            END,
                "level" = CASE WHEN right_key <= OLD.right_key 
                                 THEN "level" + _skew_level
                                 ELSE "level"
                            END,
                right_key = CASE WHEN right_key <= OLD.right_key 
                                 THEN right_key + _skew_edit
                                 ELSE CASE WHEN right_key <_left_key
                                           THEN right_key - _skew_tree
                                           ELSE right_key
                                      End
                            END,
                _trigger_lock_update = TRUE
            WHERE tree = OLD.tree AND
                  right_key> OLD.left_key AND
                  left_key <_left_key AND
                  id <> OLD.id;
        _left_key: = _left_key - _skew_tree;
    ELSE
- Moving down the tree:
        _skew_edit: = _left_key - OLD.left_key;
        UPDATE ns_tree
            SET
                right_key = CASE WHEN left_key> = OLD.left_key
                                 THEN right_key + _skew_edit
                                 ELSE CASE WHEN right_key <OLD.left_key
                                           THEN right_key + _skew_tree
                                           ELSE right_key
                                      End
                            END,
                "level" = CASE WHEN left_key> = OLD.left_key
                                 THEN "level" + _skew_level
                                 ELSE "level"
                            END,
                left_key = CASE WHEN left_key> = OLD.left_key
                                 THEN left_key + _skew_edit
                                 ELSE CASE WHEN left_key> = _left_key
                                           THEN left_key + _skew_tree
                                           ELSE left_key
                                      End
                            END,
                 _trigger_lock_update = TRUE
            WHERE tree = OLD.tree AND
                  right_key> = _left_key AND
                  left_key <OLD.right_key AND
                  id <> OLD.id;
    END IF;
- The tree was rebuilt, only our current node remained
    NEW.left_key: = _left_key;
    NEW. "Level": = _level;
    NEW.right_key: = _left_key + _skew_tree - 1;
    RETURN NEW;
END
$ BODY $
    LANGUAGE 'plpgsql' VOLATILE
    COST 100;
ALTER FUNCTION ns_tree_before_update_func () OWNER TO user;
CREATE TRIGGER ns_tree_before_update_tr
    BEFORE UPDATE
    ON ns_tree
    FOR EACH ROW
    EXECUTE PROCEDURE ns_tree_before_update_func ();

Explanations:

Initially, in addition to the _trigger_lock_update parameter , we also check the _trigger_for_delete parameter . This is done because during deletion, we don’t pass parameters like changing a field, therefore we will trigger the deletion of records through the UPDATE of certain records. However, it will become more clear further;
Again, in this case, parent_id is more priority than left_key , so we check it first;
When checking left_key, we initially select either the node that will be before the moved node ( right_key = _left_key + 1 ), or the node to which we move the node ( right_key = _left_key ). At the same time, in some cases, the query result will return 2 nodes, both the future neighbor and the future parent, which does not affect the logic in any way; therefore, LIMIT 1 is set so that it’s not just not to select extra data, sorting is not important, since even if the result of the sample is 2 nodes, both of them will be correct, therefore it makes no difference to us which one will return to us. But I want to draw attention to the fact that if we indicate at the moved node left_key = 1, then it’s natural that we won’t have any upstream or parent nodes, for which we use the additional condition " ELSIF NEW.left_key = 1 ";
When rebuilding the tree, an additional condition is " id <> OLD.id ", this is because we cannot change the record in the trigger, which we are changing now.

Delete record

Here with the removal of the most difficult. Firstly, because there are 2 principles for removing nodes:

Removing a node with descendants;
Removing a node without descendants, while the child nodes are shifted up the level in the tree;

Secondly, we cannot pass parameters in the deletion request to limit the recursive call of the trigger, however, the recursive call of the trigger is relevant only for the case of deleting a node with descendants. It will be unprofitable to make a universal trigger for both deletion principles, there will be too much logic. Two different solutions are better after all. In the first solution (deleting a node with descendants) we will have the following algorithm:

Update the child nodes to set the field (parameter) _trigger_for_delete ;
We delete child nodes;
We remove the gap in the keys in the tree (rearrange the tree);

SQL code (5)

CREATE OR REPLACE FUNCTION ns_tree_after_delete_func ()
    RETURNS trigger AS
$ BODY $
Decare
    _skew_tree INTEGER;
BEGIN
    PERFORM lock_ns_tree (OLD.tree);
- Check if the trigger is worth executing:
    IF OLD._trigger_for_delete = TRUE THEN RETURN OLD; END IF;
- We mark on removal of child nodes:
    UPDATE ns_tree
        SET _trigger_for_delete = TRUE,
            _trigger_lock_update = TRUE
        WHERE
            tree = OLD.tree AND
            left_key> OLD.left_key AND
            right_key <OLD.right_key;
- Delete the marked nodes:
    DELETE FROM ns_tree
        WHERE
            tree = OLD.tree AND
            left_key> OLD.left_key AND
            right_key <OLD.right_key;
- We remove the gap in the keys:
    _skew_tree: = OLD.right_key - OLD.left_key + 1;
    UPDATE ns_tree
        SET left_key = CASE WHEN left_key> OLD.left_key
                            THEN left_key - _skew_tree
                            ELSE left_key
                       END,
            right_key = right_key - _skew_tree,
            _trigger_lock_update = TRUE
        WHERE right_key> OLD.right_key AND
            tree = OLD.tree;
    RETURN OLD;
END
$ BODY $
    LANGUAGE 'plpgsql' VOLATILE
    COST 100;
ALTER FUNCTION ns_tree_after_delete_func () OWNER TO user;
CREATE TRIGGER ns_tree_after_delete_tr
    AFTER DELETE
    ON ns_tree
    FOR EACH ROW
    EXECUTE PROCEDURE ns_tree_after_delete_func ();

In the second solution, we simply shift the child tree up one level and remove the key break. SQL code (6)

CREATE OR REPLACE FUNCTION ns_tree_after_delete_2_func ()
    RETURNS trigger AS
$ BODY $
Decare
BEGIN
    PERFORM lock_ns_tree (OLD.tree);
- We remove the gap in the keys and move the child nodes:
   UPDATE ns_tree
        SET left_key = CASE WHEN left_key <OLD.left_key
                            THEN left_key
                            ELSE CASE WHEN right_key <OLD.right_key
                                      THEN left_key - 1 
                                      ELSE left_key - 2
                                 End
                       END,
            "level" = CASE WHEN right_key <OLD.right_key
                           THEN "level" - 1 
                           ELSE "level"
                      END,
            parent_id = CASE WHEN right_key <OLD.right_key AND "level" = OLD.level + 1
                           THEN OLD.parent_id
                           ELSE parent_id
                        END,
            right_key = CASE WHEN right_key <OLD.right_key
                             THEN right_key - 1 
                             ELSE right_key - 2
                        END,
            _trigger_lock_update = TRUE
        WHERE (right_key> OLD.right_key OR
            (left_key> OLD.left_key AND right_key <OLD.right_key)) AND
            tree = OLD.tree;
    RETURN OLD;
END
$ BODY $
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;
ALTER FUNCTION ns_tree_after_delete_2_func () OWNER TO user;
CREATE TRIGGER ns_tree_after_delete_2_tr
    AFTER DELETE
    ON ns_tree
    FOR EACH ROW
    EXECUTE PROCEDURE ns_tree_after_delete_2_func ();

Actually everything. It remains only to put down the indices (I am lazy to write SQL commands here, so I’ll just voice them):

Composite is not unique to fields ( left_key, right_key, level, tree );
Not unique on the field ( parent_id );

Enjoy ;-)

Tags: