tooteetoo July 29, 2017 at 10:40

Comparison * of tree graphs

From the sandbox

Hello, Habr!

* Actually not quite so. When developing an information system, part of which is the various processing of design and technological documentation, I had a problem that can be briefly described as follows. Today we have one composition of the product, several changes in different parts of this product come in a day, and by the evening it is not clear what has changed? Products can sometimes have more than 10,000 elements in the composition, the elements are not unique, and the reality is that changes in composition can actively come, although the product is almost ready. Misunderstanding the scope of change makes planning difficult.

The composition of the product can be represented as a tree graph. Having not found a suitable way to compare the two graphs, I decided to write my ~~bike~~ .

Task

During the day, various changes from the design system come into the accounting system. Production planning is based on data from the accounting system. Conditions allow you to accept all changes per day and recalculate the product specification at night. But, as I wrote above, it is not clear how yesterday’s state differs from today's state.

I want to see what has left the product tree, and what has been added. I also want to see which part or assembly replaced another part or assembly. For example, if an intermediate node was added to a tree branch, then in this case it would be wrong to assume that all children are removed from the old places and inserted into new places. They remained in their places, but an intermediate node was inserted. In addition, an element can “travel” up and down only inside one tree branch, this is due to the specifics of the manufacturing process.

Training

The product specification is recalculated by the PL / SQL procedure inside Oracle. Therefore, I found it logical to make my search for changes in PL / SQL.

The table in which the product tree is stored, let's call it MR_ORDER_TREE, has the following form:

`order_id`	ID of the order on which the tree is mounted
`item_id`	The item ID in the tree, together with order_id forms the primary key and is unique within the order
`item_ref`	ID of the item that the selected item belongs to
`kts_item_id`	ID from the directory of parts and assemblies
`item_qty`	number
`is_item_buy`	Is the product purchased
`item_position`	Assembly Position Number

Ligaments are (item_ref, kts_item_id)unique. In addition to the purchase, position and quantity, there are other attributes of a particular element, but we are not talking about them.

When recalculating the specification, data on changed orders are completely deleted and re-read. Therefore, you need to save the state of the tree until recounting. For this we use a similar table MR_ORDER_TREE_PREV.

The comparison result will be stored in a table MR_ORDER_TREE_COMP, which will additionally have two columns:

`stat`	Column for marking the state of elements: -1 - additional root element (more on that below) 0 - element deleted 1 - element added 2 - element properties changed 3 - unknown state (in case something went wrong) 4 - element did not change
`comm`	Comment providing additional status data

First, take the previous and current trees and drop them into a table MR_ORDER_TREE_COMP. At the same time, add a common root to them, item_idincrease the current tree by the (maximum value + 1) item_idtree with the previous state. We consider all elements of the old tree to be a deletion, and all elements of a new insert.

select nvl(max(item_id), 0) + 1
      into v_id
      from mr_order_tree_prev t
     where t.order_id = p_order_id;
    insert into MR_ORDER_TREE_COMP (ORDER_ID, ITEM_ID, KTS_ITEM_ID, ITEM_QTY, IS_ITEM_BUY, IS_ADD_WORK, ITEM_POSITION, N_GROUP, T_LEVEL, STAT, COMM)
                            values (p_order_id, -1, null, 0, 0, 0, 0, 0, 0, -1, 'Дополнительная голова для расчета');
    insert into MR_ORDER_TREE_COMP (ORDER_ID,
                                    ITEM_ID,
                                    ITEM_REF,
                                    KTS_ITEM_ID,
                                    KTS_ITEM_REF,
                                    ITEM_QTY,
                                    IS_ITEM_BUY,
                                    IS_ADD_WORK,
                                    ITEM_POSITION,
                                    N_GROUP,
                                    STAT,
                                    COMM)
    select p.order_id,
           p.item_id,
           nvl(p.item_ref, -1),
           p.kts_item_id,
           p.kts_item_ref,
           p.item_qty,
           p.is_item_buy,
           p.is_add_work,
           p.item_position,
           p.n_group,
           0,
           'удаление'
      from mr_order_tree_prev p
     where p.order_id = p_order_id;
    insert into MR_ORDER_TREE_COMP (ORDER_ID,
                                    ITEM_ID,
                                    ITEM_REF,
                                    KTS_ITEM_ID,
                                    KTS_ITEM_REF,
                                    ITEM_QTY,
                                    IS_ITEM_BUY,
                                    IS_ADD_WORK,
                                    ITEM_POSITION,
                                    N_GROUP,
                                    STAT,
                                    COMM)
    select p.order_id,
           p.item_id + v_id,
           case
             when p.item_ref is null
               then -1
             else p.item_ref + v_id
           end,
           p.kts_item_id,
           p.kts_item_ref,
           p.item_qty,
           p.is_item_buy,
           p.is_add_work,
           p.item_position,
           p.n_group,
           1,
           'вставка'
      from mr_order_tree p
     where p.order_id = p_order_id;

We also put on the elements of the summary tree their nesting level, so that in the future we do not calculate it every time.

for rec in (select item_id,
             level lev
        from (select *
                from mr_order_tree_comp
               where order_id = p_order_id)
     connect by prior item_id = item_ref
       start with item_id = -1)
loop
  update mr_order_tree_comp c
     set c.t_level = rec.lev
   where c.order_id = p_order_id
         and c.item_id = rec.item_id;
end loop;

We will save the state of the table mr_order_tree_compfor the recalculated order, this will be needed in the future. I used the collection, but I think that you can apply a temporary table.

  procedure save_tree_stat(p_order in number)
  is
  begin
    select TREE_BC_STAT_ROW(c.order_id, c.item_id, c.item_ref, c.kts_item_id, c.kts_item_ref)
      bulk collect into tree_before_calc
      from mr_order_tree_comp c
     where c.order_id = p_order;
  end save_tree_stat;

"Addition" of trees

Now you need to go through the resulting tree by levels and nodes in search of changes. First, we determine the maximum level:

select max(t_level)
  into v_max_lvl
  from mr_order_tree_comp
 where order_id = p_order_id;

Now in the loop we go through the levels of this tree. At each level, also in the loop, we will select the children of each node, and look for the element with the “opposite sign”. Simply put, look for the same kts_item_idwith the same item_ref, but state 1 for 0 and 0 for 1. After that, delete one of them, and transfer the incoming elements to the remaining node.

When an item is processed, put it in some list of processed items. I used the following procedure:

  procedure add_to_rdy (p_item in number,
                        p_order in number)
  is
  begin
    item_ready_list.extend;
    item_ready_list(item_ready_list.last) := tree_rdy_list_row(p_order, p_item);
  end add_to_rdy;

The "processing" of the element was returned by the function

function item_rdy(p_item in number, p_order in number) return number

who looked at the same collection.

The loop is as follows:

<>
     for i in 1..v_max_lvl
     loop
       <>
       for rh in (select c.*
                    from mr_order_tree_comp c
                   where c.order_id = p_order_id
                         and c.t_level = i)
       loop
         <>
         for rl in (select c.*
                      from mr_order_tree_comp c
                     where c.order_id = p_order_id
                           and c.item_ref = rh.item_id
                           and c.stat in (0, 1)
                     order by c.stat)
         loop
           if (item_rdy(rl.item_id, rl.order_id) = 0) then
             if (rl.stat = 0) then
               select count(*)
                 into v_cnt
                 from mr_order_tree_comp c
                where c.order_id = p_order_id
                      and c.item_ref = rh.item_id
                      and c.kts_item_id = rl.kts_item_id
                      and c.stat = 1;
               case
                 when (v_cnt = 1) then
                   select c.item_id
                     into v_item
                     from mr_order_tree_comp c
                    where c.order_id = p_order_id
                          and c.item_ref = rh.item_id
                          and c.kts_item_id = rl.kts_item_id
                          and c.stat = 1;
                   update mr_order_tree_comp c
                      set c.item_ref = v_item
                    where c.item_ref = rl.item_id
                          and c.order_id = p_order_id;
                   update mr_order_tree_comp c
                      set c.stat = 4
                    where c.item_id = v_item
                          and c.order_id = p_order_id;
                   diff_items(p_order_id, rl.item_id, v_item);
                   delete mr_order_tree_comp c
                    where c.item_id = rl.item_id
                          and c.order_id = p_order_id;
                   add_to_rdy(rl.item_id, rl.order_id);
                   add_to_rdy(v_item, rl.order_id);
               end case;
             end if;

For the (rl.stat = 1)logic is similar.

When the same element was found, then everything is simple - you just need to compare their properties. To do this, use the function diff_items. The situation when more than one element is found is more likely an exception and suggests that something is wrong with the composition tree.

Search for “Similar” Items

But what to do when one node has been replaced by another, a node has been inserted in the middle or a node has been removed from the middle?

To determine the described situations, I did not find anything smarter than just comparing the compositions of two nodes in order to determine the ratio of the number of identical kts_item_idto the total number of elements. If the value of this relationship is greater than a certain value, then the nodes are interchangeable. If the node has several replacement options at the current iteration of the loop, then the option with the highest “similarity coefficient” is taken.

Perhaps with such a bold decision I will ever shoot myself in the foot.

Add some code to the main one CASE.

when (v_cnt = 0) then
 select count(*)
   into v_cnt
   from mr_order_tree_comp c
  where c.order_id = p_order_id
        and c.item_ref = rh.item_id
        and c.stat = 1
        and not exists (select 1
                          from table(item_ready_list) a
                         where a.order_id = c.order_id
                               and a.item_id = c.item_id);

Managed to find one item in this node

 if (v_cnt = 1) then
   select c.item_id, c.kts_item_id
     into v_item,    v_kts
     from mr_order_tree_comp c
    where c.order_id = p_order_id
          and c.item_ref = rh.item_id
          and c.stat = 1
          and not exists (select 1
                            from table(item_ready_list) a
                           where a.order_id = c.order_id
                                 and a.item_id = c.item_id);

Let's transfer the contents of the current node to it without hesitation. Provided that both elements are assemblies. The current item is not deleted. Why? If the nodes do not have identical elements at all, then in the future everything will be returned to their places.

   if (is_item_comp(v_item, p_order_id) = is_item_comp(rl.item_id, p_order_id)) then
     update mr_order_tree_comp c
        set c.item_ref = v_item
      where c.item_ref = rl.item_id
            and c.order_id = p_order_id;
     add_to_rdy(rl.item_id, rl.order_id);
     add_to_rdy(v_item, rl.order_id);
   end if;
 end if;

If it was possible to find several elements, then the most similar assembly is taken. To determine the “similarity”, a procedure is used like_degree; the coefficient value for comparison is contained in a variable lperc.

if (v_cnt > 1) then
   begin
     select item_id, kts_item_id, max_lperc
       into v_item,  v_kts,       v_perc
       from (select c.item_id,
                    c.kts_item_id,
                    max(like_degree(rl.item_id, c.item_id, c.order_id)) max_lperc
               from mr_order_tree_comp c
              where c.order_id = p_order_id
                    and c.item_ref = rh.item_id
                    and c.stat = 1
                    and not exists (select 1
                                      from table(item_ready_list) a
                                     where a.order_id = c.order_id
                                           and a.item_id = c.item_id)
                    and is_item_comp(c.item_id, p_order_id) = (select is_item_comp(rl.item_id, p_order_id) from dual)
              group by c.item_id, c.kts_item_id
              order by max_lperc desc)
      where rownum < 2;
     if (v_perc >= lperc) then
       update mr_order_tree_comp c
          set c.item_ref = v_item
        where c.item_ref = rl.item_id
              and c.order_id = p_order_id;
       update mr_order_tree_comp c
          set c.comm = 'Удаление. Заменилось на ' || kts_pack.item_code(v_kts) || ' (' || to_char(v_perc) || '%)'
        where c.item_id = rl.item_id
              and c.order_id = p_order_id;
       add_to_rdy(rl.item_id, rl.order_id);
       add_to_rdy(v_item, rl.order_id);
     end if;
   end;
 end  if;

As a result, some of the elements will be re-linked, and the tree will have the following appearance.

If the additional root element that was added at the beginning is not needed, then it can be deleted.

Now you can take all the remaining elements with the status 0 and 1, and starting from them go up to the root. If the same element with the “opposite status” is found, then compare them, remove the element from 0 from the tree, and change the status for the element with 1.

<>
for rs in (select *
             from mr_order_tree_comp c
            where c.order_id = p_order_id
                  and c.stat in (0,1))
loop
  <>
  for rb in (select *
               from (select *
                       from mr_order_tree_comp c
                      where c.order_id = p_order_id) t
            connect by prior t.item_ref = t.item_id
              start with t.item_id = rs.item_id)
  loop
    select count(*)
      into v_cnt
      from mr_order_tree_comp c
     where c.item_ref = rb.item_id
           and c.kts_item_id = rs.kts_item_id
           and c.stat in (1,0)
           and c.stat != rs.stat;
    if (v_cnt = 1) then
      select c.item_id
        into v_item
        from mr_order_tree_comp c
       where c.item_ref = rb.item_id
             and c.kts_item_id = rs.kts_item_id
             and c.stat in (1,0)
             and c.stat != rs.stat;
      if (rs.stat = 0) then
        update mr_order_tree_comp c
           set c.stat = 4
         where c.order_id = p_order_id
               and c.item_id = v_item;
        diff_items(p_order_id, rs.item_id, v_item);
        update mr_order_tree_comp c
           set c.item_ref = v_item
         where c.order_id = p_order_id
               and c.item_ref = rs.item_id;
        delete mr_order_tree_comp
         where order_id = p_order_id
               and item_id = rs.item_id;
      end if;
      if (rs.stat = 1) then
        update mr_order_tree_comp c
           set c.stat = 4
         where c.order_id = p_order_id
               and c.item_id = rs.item_id;
        diff_items(p_order_id, rs.item_id, v_item);
        update mr_order_tree_comp c
           set c.item_ref = rs.item_id
         where c.order_id = p_order_id
               and c.item_ref = v_item;
        delete mr_order_tree_comp
         where order_id = p_order_id
               and item_id = v_item;
      end if;
      continue items;
    end if;
  end loop branch;
end loop items;

Now go through the remaining elements with status 0 and restore their previous ones item_ref. To do this, use the collection tree_before_calcin which the initial state of the tree was saved mr_order_tree_comp.

After that, the tree gets the desired look.

I believe that there are more beautiful, faster and more correct ways to compare trees. This is my option and I hope that it will be useful to someone and will show how to do it or how not to do it.

Tags:

Comparison * of tree graphs

Task

Training

"Addition" of trees

Search for “Similar” Items

Also popular now: