Comparison * of tree graphs
Hello, Habr!
* Actually not quite so. When developing an information system, part of which is the various processing of design and technological documentation, I had a problem that can be briefly described as follows. Today we have one composition of the product, several changes in different parts of this product come in a day, and by the evening it is not clear what has changed? Products can sometimes have more than 10,000 elements in the composition, the elements are not unique, and the reality is that changes in composition can actively come, although the product is almost ready. Misunderstanding the scope of change makes planning difficult.
The composition of the product can be represented as a tree graph. Having not found a suitable way to compare the two graphs, I decided to write mybike .
During the day, various changes from the design system come into the accounting system. Production planning is based on data from the accounting system. Conditions allow you to accept all changes per day and recalculate the product specification at night. But, as I wrote above, it is not clear how yesterday’s state differs from today's state.
I want to see what has left the product tree, and what has been added. I also want to see which part or assembly replaced another part or assembly. For example, if an intermediate node was added to a tree branch, then in this case it would be wrong to assume that all children are removed from the old places and inserted into new places. They remained in their places, but an intermediate node was inserted. In addition, an element can “travel” up and down only inside one tree branch, this is due to the specifics of the manufacturing process.
The product specification is recalculated by the PL / SQL procedure inside Oracle. Therefore, I found it logical to make my search for changes in PL / SQL.
The table in which the product tree is stored, let's call it
Ligaments are
When recalculating the specification, data on changed orders are completely deleted and re-read. Therefore, you need to save the state of the tree until recounting. For this we use a similar table
The comparison result will be stored in a table
First, take the previous and current trees and drop them into a table
We also put on the elements of the summary tree their nesting level, so that in the future we do not calculate it every time.
We will save the state of the table
Now you need to go through the resulting tree by levels and nodes in search of changes. First, we determine the maximum level:
Now in the loop we go through the levels of this tree. At each level, also in the loop, we will select the children of each node, and look for the element with the “opposite sign”. Simply put, look for the same
When an item is processed, put it in some list of processed items. I used the following procedure:
The "processing" of the element was returned by the function
who looked at the same collection.
The loop is as follows:
For the
When the same element was found, then everything is simple - you just need to compare their properties. To do this, use the function
But what to do when one node has been replaced by another, a node has been inserted in the middle or a node has been removed from the middle?
To determine the described situations, I did not find anything smarter than just comparing the compositions of two nodes in order to determine the ratio of the number of identical
Perhaps with such a bold decision I will ever shoot myself in the foot.
Add some code to the main one
Managed to find one item in this node
Let's transfer the contents of the current node to it without hesitation. Provided that both elements are assemblies. The current item is not deleted. Why? If the nodes do not have identical elements at all, then in the future everything will be returned to their places.
If it was possible to find several elements, then the most similar assembly is taken. To determine the “similarity”, a procedure is used
As a result, some of the elements will be re-linked, and the tree will have the following appearance.
If the additional root element that was added at the beginning is not needed, then it can be deleted.
Now you can take all the remaining elements with the status 0 and 1, and starting from them go up to the root. If the same element with the “opposite status” is found, then compare them, remove the element from 0 from the tree, and change the status for the element with 1.
Now go through the remaining elements with status 0 and restore their previous ones
After that, the tree gets the desired look.
I believe that there are more beautiful, faster and more correct ways to compare trees. This is my option and I hope that it will be useful to someone and will show how to do it or how not to do it.
* Actually not quite so. When developing an information system, part of which is the various processing of design and technological documentation, I had a problem that can be briefly described as follows. Today we have one composition of the product, several changes in different parts of this product come in a day, and by the evening it is not clear what has changed? Products can sometimes have more than 10,000 elements in the composition, the elements are not unique, and the reality is that changes in composition can actively come, although the product is almost ready. Misunderstanding the scope of change makes planning difficult.
The composition of the product can be represented as a tree graph. Having not found a suitable way to compare the two graphs, I decided to write my
Task
During the day, various changes from the design system come into the accounting system. Production planning is based on data from the accounting system. Conditions allow you to accept all changes per day and recalculate the product specification at night. But, as I wrote above, it is not clear how yesterday’s state differs from today's state.
I want to see what has left the product tree, and what has been added. I also want to see which part or assembly replaced another part or assembly. For example, if an intermediate node was added to a tree branch, then in this case it would be wrong to assume that all children are removed from the old places and inserted into new places. They remained in their places, but an intermediate node was inserted. In addition, an element can “travel” up and down only inside one tree branch, this is due to the specifics of the manufacturing process.
Training
The product specification is recalculated by the PL / SQL procedure inside Oracle. Therefore, I found it logical to make my search for changes in PL / SQL.
The table in which the product tree is stored, let's call it
MR_ORDER_TREE
, has the following form:order_id | ID of the order on which the tree is mounted |
item_id | The item ID in the tree, together with order_id forms the primary key and is unique within the order |
item_ref | ID of the item that the selected item belongs to |
kts_item_id | ID from the directory of parts and assemblies |
item_qty | number |
is_item_buy | Is the product purchased |
item_position | Assembly Position Number |
Ligaments are
(item_ref, kts_item_id)
unique. In addition to the purchase, position and quantity, there are other attributes of a particular element, but we are not talking about them. When recalculating the specification, data on changed orders are completely deleted and re-read. Therefore, you need to save the state of the tree until recounting. For this we use a similar table
MR_ORDER_TREE_PREV
. The comparison result will be stored in a table
MR_ORDER_TREE_COMP
, which will additionally have two columns:stat | Column for marking the state of elements: -1 - additional root element (more on that below) 0 - element deleted 1 - element added 2 - element properties changed 3 - unknown state (in case something went wrong) 4 - element did not change |
comm | Comment providing additional status data |
First, take the previous and current trees and drop them into a table
MR_ORDER_TREE_COMP
. At the same time, add a common root to them, item_id
increase the current tree by the (maximum value + 1) item_id
tree with the previous state. We consider all elements of the old tree to be a deletion, and all elements of a new insert.select nvl(max(item_id), 0) + 1
into v_id
from mr_order_tree_prev t
where t.order_id = p_order_id;
insert into MR_ORDER_TREE_COMP (ORDER_ID, ITEM_ID, KTS_ITEM_ID, ITEM_QTY, IS_ITEM_BUY, IS_ADD_WORK, ITEM_POSITION, N_GROUP, T_LEVEL, STAT, COMM)
values (p_order_id, -1, null, 0, 0, 0, 0, 0, 0, -1, 'Дополнительная голова для расчета');
insert into MR_ORDER_TREE_COMP (ORDER_ID,
ITEM_ID,
ITEM_REF,
KTS_ITEM_ID,
KTS_ITEM_REF,
ITEM_QTY,
IS_ITEM_BUY,
IS_ADD_WORK,
ITEM_POSITION,
N_GROUP,
STAT,
COMM)
select p.order_id,
p.item_id,
nvl(p.item_ref, -1),
p.kts_item_id,
p.kts_item_ref,
p.item_qty,
p.is_item_buy,
p.is_add_work,
p.item_position,
p.n_group,
0,
'удаление'
from mr_order_tree_prev p
where p.order_id = p_order_id;
insert into MR_ORDER_TREE_COMP (ORDER_ID,
ITEM_ID,
ITEM_REF,
KTS_ITEM_ID,
KTS_ITEM_REF,
ITEM_QTY,
IS_ITEM_BUY,
IS_ADD_WORK,
ITEM_POSITION,
N_GROUP,
STAT,
COMM)
select p.order_id,
p.item_id + v_id,
case
when p.item_ref is null
then -1
else p.item_ref + v_id
end,
p.kts_item_id,
p.kts_item_ref,
p.item_qty,
p.is_item_buy,
p.is_add_work,
p.item_position,
p.n_group,
1,
'вставка'
from mr_order_tree p
where p.order_id = p_order_id;
We also put on the elements of the summary tree their nesting level, so that in the future we do not calculate it every time.
for rec in (select item_id,
level lev
from (select *
from mr_order_tree_comp
where order_id = p_order_id)
connect by prior item_id = item_ref
start with item_id = -1)
loop
update mr_order_tree_comp c
set c.t_level = rec.lev
where c.order_id = p_order_id
and c.item_id = rec.item_id;
end loop;
We will save the state of the table
mr_order_tree_comp
for the recalculated order, this will be needed in the future. I used the collection, but I think that you can apply a temporary table. procedure save_tree_stat(p_order in number)
is
begin
select TREE_BC_STAT_ROW(c.order_id, c.item_id, c.item_ref, c.kts_item_id, c.kts_item_ref)
bulk collect into tree_before_calc
from mr_order_tree_comp c
where c.order_id = p_order;
end save_tree_stat;
"Addition" of trees
Now you need to go through the resulting tree by levels and nodes in search of changes. First, we determine the maximum level:
select max(t_level)
into v_max_lvl
from mr_order_tree_comp
where order_id = p_order_id;
Now in the loop we go through the levels of this tree. At each level, also in the loop, we will select the children of each node, and look for the element with the “opposite sign”. Simply put, look for the same
kts_item_id
with the same item_ref
, but state 1 for 0 and 0 for 1. After that, delete one of them, and transfer the incoming elements to the remaining node. When an item is processed, put it in some list of processed items. I used the following procedure:
procedure add_to_rdy (p_item in number,
p_order in number)
is
begin
item_ready_list.extend;
item_ready_list(item_ready_list.last) := tree_rdy_list_row(p_order, p_item);
end add_to_rdy;
The "processing" of the element was returned by the function
function item_rdy(p_item in number, p_order in number) return number
who looked at the same collection.
The loop is as follows:
<>
for i in 1..v_max_lvl
loop
<>
for rh in (select c.*
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.t_level = i)
loop
<>
for rl in (select c.*
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.item_ref = rh.item_id
and c.stat in (0, 1)
order by c.stat)
loop
if (item_rdy(rl.item_id, rl.order_id) = 0) then
if (rl.stat = 0) then
select count(*)
into v_cnt
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.item_ref = rh.item_id
and c.kts_item_id = rl.kts_item_id
and c.stat = 1;
case
when (v_cnt = 1) then
select c.item_id
into v_item
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.item_ref = rh.item_id
and c.kts_item_id = rl.kts_item_id
and c.stat = 1;
update mr_order_tree_comp c
set c.item_ref = v_item
where c.item_ref = rl.item_id
and c.order_id = p_order_id;
update mr_order_tree_comp c
set c.stat = 4
where c.item_id = v_item
and c.order_id = p_order_id;
diff_items(p_order_id, rl.item_id, v_item);
delete mr_order_tree_comp c
where c.item_id = rl.item_id
and c.order_id = p_order_id;
add_to_rdy(rl.item_id, rl.order_id);
add_to_rdy(v_item, rl.order_id);
end case;
end if;
For the
(rl.stat = 1)
logic is similar. When the same element was found, then everything is simple - you just need to compare their properties. To do this, use the function
diff_items
. The situation when more than one element is found is more likely an exception and suggests that something is wrong with the composition tree.Search for “Similar” Items
But what to do when one node has been replaced by another, a node has been inserted in the middle or a node has been removed from the middle?
To determine the described situations, I did not find anything smarter than just comparing the compositions of two nodes in order to determine the ratio of the number of identical
kts_item_id
to the total number of elements. If the value of this relationship is greater than a certain value, then the nodes are interchangeable. If the node has several replacement options at the current iteration of the loop, then the option with the highest “similarity coefficient” is taken. Perhaps with such a bold decision I will ever shoot myself in the foot.
Add some code to the main one
CASE
.when (v_cnt = 0) then
select count(*)
into v_cnt
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.item_ref = rh.item_id
and c.stat = 1
and not exists (select 1
from table(item_ready_list) a
where a.order_id = c.order_id
and a.item_id = c.item_id);
Managed to find one item in this node
if (v_cnt = 1) then
select c.item_id, c.kts_item_id
into v_item, v_kts
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.item_ref = rh.item_id
and c.stat = 1
and not exists (select 1
from table(item_ready_list) a
where a.order_id = c.order_id
and a.item_id = c.item_id);
Let's transfer the contents of the current node to it without hesitation. Provided that both elements are assemblies. The current item is not deleted. Why? If the nodes do not have identical elements at all, then in the future everything will be returned to their places.
if (is_item_comp(v_item, p_order_id) = is_item_comp(rl.item_id, p_order_id)) then
update mr_order_tree_comp c
set c.item_ref = v_item
where c.item_ref = rl.item_id
and c.order_id = p_order_id;
add_to_rdy(rl.item_id, rl.order_id);
add_to_rdy(v_item, rl.order_id);
end if;
end if;
If it was possible to find several elements, then the most similar assembly is taken. To determine the “similarity”, a procedure is used
like_degree
; the coefficient value for comparison is contained in a variable lperc
.if (v_cnt > 1) then
begin
select item_id, kts_item_id, max_lperc
into v_item, v_kts, v_perc
from (select c.item_id,
c.kts_item_id,
max(like_degree(rl.item_id, c.item_id, c.order_id)) max_lperc
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.item_ref = rh.item_id
and c.stat = 1
and not exists (select 1
from table(item_ready_list) a
where a.order_id = c.order_id
and a.item_id = c.item_id)
and is_item_comp(c.item_id, p_order_id) = (select is_item_comp(rl.item_id, p_order_id) from dual)
group by c.item_id, c.kts_item_id
order by max_lperc desc)
where rownum < 2;
if (v_perc >= lperc) then
update mr_order_tree_comp c
set c.item_ref = v_item
where c.item_ref = rl.item_id
and c.order_id = p_order_id;
update mr_order_tree_comp c
set c.comm = 'Удаление. Заменилось на ' || kts_pack.item_code(v_kts) || ' (' || to_char(v_perc) || '%)'
where c.item_id = rl.item_id
and c.order_id = p_order_id;
add_to_rdy(rl.item_id, rl.order_id);
add_to_rdy(v_item, rl.order_id);
end if;
end;
end if;
As a result, some of the elements will be re-linked, and the tree will have the following appearance.
If the additional root element that was added at the beginning is not needed, then it can be deleted.
Now you can take all the remaining elements with the status 0 and 1, and starting from them go up to the root. If the same element with the “opposite status” is found, then compare them, remove the element from 0 from the tree, and change the status for the element with 1.
<>
for rs in (select *
from mr_order_tree_comp c
where c.order_id = p_order_id
and c.stat in (0,1))
loop
<>
for rb in (select *
from (select *
from mr_order_tree_comp c
where c.order_id = p_order_id) t
connect by prior t.item_ref = t.item_id
start with t.item_id = rs.item_id)
loop
select count(*)
into v_cnt
from mr_order_tree_comp c
where c.item_ref = rb.item_id
and c.kts_item_id = rs.kts_item_id
and c.stat in (1,0)
and c.stat != rs.stat;
if (v_cnt = 1) then
select c.item_id
into v_item
from mr_order_tree_comp c
where c.item_ref = rb.item_id
and c.kts_item_id = rs.kts_item_id
and c.stat in (1,0)
and c.stat != rs.stat;
if (rs.stat = 0) then
update mr_order_tree_comp c
set c.stat = 4
where c.order_id = p_order_id
and c.item_id = v_item;
diff_items(p_order_id, rs.item_id, v_item);
update mr_order_tree_comp c
set c.item_ref = v_item
where c.order_id = p_order_id
and c.item_ref = rs.item_id;
delete mr_order_tree_comp
where order_id = p_order_id
and item_id = rs.item_id;
end if;
if (rs.stat = 1) then
update mr_order_tree_comp c
set c.stat = 4
where c.order_id = p_order_id
and c.item_id = rs.item_id;
diff_items(p_order_id, rs.item_id, v_item);
update mr_order_tree_comp c
set c.item_ref = rs.item_id
where c.order_id = p_order_id
and c.item_ref = v_item;
delete mr_order_tree_comp
where order_id = p_order_id
and item_id = v_item;
end if;
continue items;
end if;
end loop branch;
end loop items;
Now go through the remaining elements with status 0 and restore their previous ones
item_ref
. To do this, use the collection tree_before_calc
in which the initial state of the tree was saved mr_order_tree_comp
. After that, the tree gets the desired look.
I believe that there are more beautiful, faster and more correct ways to compare trees. This is my option and I hope that it will be useful to someone and will show how to do it or how not to do it.