Materialized views as a means of monitoring data integrity
Data integrity control is one of the most important DBMS functions. The more carefully this control is organized, the easier it is to implement applied logic, because the more restrictions are controlled by the database, the fewer variations of “what if” should be provided for when implementing the logic. At the same time, integrity monitoring is quite convenient to use to verify the correct operation of the application layer. Something like unit tests. An “extra” check can sometimes do a very good job.
The traditional set of restrictions is the limitation of the primary, foreign keys, uniqueness when using normalization, which allows to satisfy the vast majority of cases of control needs. However, in the case when the restriction turns out to be dependent on the values in several tables and rows, these means are not enough. Such restrictions have to be implemented by trigger logic. And implementation is far from always simple. The developer has to keep in mind that data modification can be carried out in a competitive environment, so you need to take care of blocking resources yourself, while still trying to avoid deadlocks. Implementing a row restriction may require access to other rows of the same table, which,
But there is another way. In some cases, it becomes possible to use the restrictions imposed on materialized representations, updated upon the fact of committing transactions (fast refresh on commit). Such restrictions will work as deferred and will not allow you to commit a transaction if suddenly the data integrity is violated. Within the framework of a modifying transaction, restrictions may be violated. On the one hand, this simplifies data modification; on the other hand, it makes it difficult to identify the source of the error. In this article, I would like to give a couple of simple examples of the implementation of such restrictions.
I would like to show the implementation of the approach using a fictitious simplified example. It turned out to be difficult enough to choose such an example so that it was simple enough to perceive, but at the same time, so that the application of the approach is justified, do not blame me if something suddenly happened wrong.
Let us have the need to account for goods in the context of the placement zone. The placement in this case is the store (S) or warehouse (W).
Zone - the physical or logical territory of each particular location. For example - a trading floor, or even shelves of a trading floor, a material room, a refrigerator, a zone of lost goods. Each location can have more than one zone of each type, but one zone of each type must be marked as primary. It will be used by default if the zone of operation is not explicitly defined. The main zone should be one and only one for each type of zone. This will be the first type of restriction that we will try to implement.
The second type of restriction is the composition of the zones. In our example, we restrict ourselves to setting rules for three types of zones:
Each placement record can be in three states - draft (W), active (A), inactive (I). When the record is in the “Draft” state, we give the user the greatest freedom of action and allow us to violate this restriction.
To implement this limitation, we will create a materialized representation that will calculate the main zones for each type of placement zone, and we will impose a constraint on top that controls strict equality to the unit of the calculated value. For queries on the basis of which materialized representations are built, a number of restrictions are defined , which is further tightened by imposing fast update requirements . In our case, we have an aggregated materialized view, and therefore we must create a materialized view log for the zone table, including rowid and new values, the list of fields of which should include all the values that may affect the query result
We must also include in the result returned by the request the value “count (*)”
It should be noted here: in order to evaluate whether the materialized view built on demand can use the fast method for updating, the dbms_mivew.explain_mview procedure exists . It is highly advisable to use it to control whether the fast update method is available for submission. For example, if we forgot to specify count (*) in the request, a materialized view would be successfully created and would work correctly when performing the insert operation. However, upon modification, deletion, the value of primary_count would not be recounted, which would violate the logic of our restriction. However, if we use explain_mview, Oracle will help us tell us our miscalculation.
So, the materialized view is created, it remains only to add a restriction
Please note that the restriction is created deferred. The fact is that in the process of updating the presentation by the Oracle, at some intermediate stage, it may turn out that the restriction will be temporarily violated. To avoid such false positives, it is better to set such restrictions deliberately delayed.
Check the operation of this restriction
Let's try to create a zone with a type that does not have a “main” mark.
We will try to identify two main zones for placement with the same type.
This restriction differs from the previous one in that it relies on the values of not one table, but two. Those. at the same time, you must satisfy the requirements of the fast update method for views with connections and aggregate views . But this is unreal. We cannot simultaneously output the rowid of the attached row and count (*) to the result. For this reason, it is necessary to build a cascade of materialized representations. In one, data sets will be connected, in the other, aggregation.
First you need to create a materialized veiw log for the allocation table. For the zone table, the previously created log will be used.
Next we create join mivew. Unfortunately ANSI does not perceive syntax here, we will use old-style join.
Create materialized veiw log to join view
Create an aggregated materialized view
Well, the limitations themselves
Check the operation of the restrictions:
The placement has been successfully created in the draft status. Let's try to activate it:
Not. You cannot activate a placement if the zones necessary for its type are not defined for it.
It is very convenient to have such restrictions at the stage of testing, starting, stabilizing projects, when there is a lack of confidence in the correctness of the logic, and the data can be modified by the application link.
Obviously, you should not use this approach in cases where it gives a noticeable drawdown in performance.
In case of modification, refinement of the data scheme, the content of the views may not be reliable and will require a complete update. If the time for a complete update of a materialized view could jeopardize the implementation of technical work regulations, you probably should not use this approach either.
The traditional set of restrictions is the limitation of the primary, foreign keys, uniqueness when using normalization, which allows to satisfy the vast majority of cases of control needs. However, in the case when the restriction turns out to be dependent on the values in several tables and rows, these means are not enough. Such restrictions have to be implemented by trigger logic. And implementation is far from always simple. The developer has to keep in mind that data modification can be carried out in a competitive environment, so you need to take care of blocking resources yourself, while still trying to avoid deadlocks. Implementing a row restriction may require access to other rows of the same table, which,
But there is another way. In some cases, it becomes possible to use the restrictions imposed on materialized representations, updated upon the fact of committing transactions (fast refresh on commit). Such restrictions will work as deferred and will not allow you to commit a transaction if suddenly the data integrity is violated. Within the framework of a modifying transaction, restrictions may be violated. On the one hand, this simplifies data modification; on the other hand, it makes it difficult to identify the source of the error. In this article, I would like to give a couple of simple examples of the implementation of such restrictions.
Formulation of the problem
I would like to show the implementation of the approach using a fictitious simplified example. It turned out to be difficult enough to choose such an example so that it was simple enough to perceive, but at the same time, so that the application of the approach is justified, do not blame me if something suddenly happened wrong.
Let us have the need to account for goods in the context of the placement zone. The placement in this case is the store (S) or warehouse (W).
Zone - the physical or logical territory of each particular location. For example - a trading floor, or even shelves of a trading floor, a material room, a refrigerator, a zone of lost goods. Each location can have more than one zone of each type, but one zone of each type must be marked as primary. It will be used by default if the zone of operation is not explicitly defined. The main zone should be one and only one for each type of zone. This will be the first type of restriction that we will try to implement.
The second type of restriction is the composition of the zones. In our example, we restrict ourselves to setting rules for three types of zones:
- Storage zone (K) - the zone in which goods are stored; this zone is required for the warehouse, but can also be defined for the store.
- Area of the sales area (S) - required for the store and cannot be determined for the warehouse
- The zone of the lost product (L) is the logical zone into which the product will be moved, the status of which is not clear, requires clarification. Mandatory for both warehouse and store
Each placement record can be in three states - draft (W), active (A), inactive (I). When the record is in the “Draft” state, we give the user the greatest freedom of action and allow us to violate this restriction.
Implementation
tables
create table location (
loc number primary key
,loc_type varchar2(1 char) check (loc_type in ('S','W'))
,status varchar2(1 char) not null check (status in ('W','A','I'))
,loc_desc varchar2(200 char)
);
create table zone(
zone number primary key
,loc number references location(loc)
,is_pirmary varchar(1 char) not null check (is_pirmary in ('Y','N'))
,zone_type varchar2(1 char) not null
,zone_desc varchar2(200 char)
);
insert into location values (1,'S','W','Магазин 1 без ошибок');
insert into zone values (1,1,'Y','K','Зона хранения магазина 1');
insert into zone values (2,1,'Y','S','Торговый зал магазина 1');
insert into zone values (3,1,'Y','L','Зона утерянного товара магазина 1');
commit;
Limit the number of main zones
To implement this limitation, we will create a materialized representation that will calculate the main zones for each type of placement zone, and we will impose a constraint on top that controls strict equality to the unit of the calculated value. For queries on the basis of which materialized representations are built, a number of restrictions are defined , which is further tightened by imposing fast update requirements . In our case, we have an aggregated materialized view, and therefore we must create a materialized view log for the zone table, including rowid and new values, the list of fields of which should include all the values that may affect the query result
create materialized view log
on zone with rowid
,sequence
(zone,loc,zone_type,is_primary)
including new values
We must also include in the result returned by the request the value “count (*)”
create materialized view mv$zoneloc_pimary$chk
refresh fast on commit
as
select loc
,zone_type
,count(decode(is_primary,'Y',1)) primary_count
,count(*) cnt
from zone
group by loc,zone_type;
It should be noted here: in order to evaluate whether the materialized view built on demand can use the fast method for updating, the dbms_mivew.explain_mview procedure exists . It is highly advisable to use it to control whether the fast update method is available for submission. For example, if we forgot to specify count (*) in the request, a materialized view would be successfully created and would work correctly when performing the insert operation. However, upon modification, deletion, the value of primary_count would not be recounted, which would violate the logic of our restriction. However, if we use explain_mview, Oracle will help us tell us our miscalculation.
SQL> set serveroutput on
SQL> declare
2 result SYS.ExplainMVArrayType;
3 begin
4 dbms_mview.explain_mview(mv => 'select loc
5 ,zone_type
6 ,count(decode(is_primary,''Y'',1)) primary_counnt
7 --,count(*) cnt
8 from zone
9 group by loc,zone_type'
10 ,msg_array => result
11 );
12 for i in 1..result.count
13 loop
14 dbms_output.put(rpad(result(i).capability_name,30,' '));
15 dbms_output.put(' '||result(i).POSSIBLE);
16 dbms_output.put(' '||result(i).MSGTXT);
17 dbms_output.put_line(null);
18 end loop;
19 end;
20 /
PCT F
REFRESH_COMPLETE T
REFRESH_FAST T
REWRITE T
PCT_TABLE F relation is not a partitioned table
REFRESH_FAST_AFTER_INSERT T
REFRESH_FAST_AFTER_ONETAB_DML F COUNT(*) is not present in the select list
REFRESH_FAST_AFTER_ANY_DML F see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
REFRESH_FAST_PCT F PCT is not possible on any of the detail tables in the materialized view
REWRITE_FULL_TEXT_MATCH T
REWRITE_PARTIAL_TEXT_MATCH T
REWRITE_GENERAL T
REWRITE_PCT F general rewrite is not possible or PCT is not possible on any of the detail tables
PCT_TABLE_REWRITE F relation is not a partitioned table
PL/SQL procedure successfully completed
So, the materialized view is created, it remains only to add a restriction
alter table mv$zoneloc_pimary$chk
add constraint zone_loc_primary$chk
check (primary_count=1)
deferrable initially deferred;
Please note that the restriction is created deferred. The fact is that in the process of updating the presentation by the Oracle, at some intermediate stage, it may turn out that the restriction will be temporarily violated. To avoid such false positives, it is better to set such restrictions deliberately delayed.
Check the operation of this restriction
SQL> insert into location values (2,'S','W','Магазин 2 нарушения ограниченя основной зоны ');
1 row inserted
SQL> commit;
Commit complete
Let's try to create a zone with a type that does not have a “main” mark.
SQL> insert into zone values (4,2,'N','S','Основной торговый зал не определен');
1 row inserted
SQL> commit;
ORA-02091: transaction rolled back
ORA-02290: check constraint (ZTXN.ZONE_LOC_PRIMARY$CHK) violated
We will try to identify two main zones for placement with the same type.
SQL> insert into zone values (5,2,'Y','L','Первая основная зона утерянного товара');
1 row inserted
SQL> commit;
Commit complete
SQL> insert into zone values (6,2,'Y','L','Вторая основная зона утерянного товара');
1 row inserted
SQL> commit;
ORA-02091: transaction rolled back
ORA-02290: check constraint (ZTXN.ZONE_LOC_PRIMARY$CHK) violated
Restriction of the composition of the accommodation areas
This restriction differs from the previous one in that it relies on the values of not one table, but two. Those. at the same time, you must satisfy the requirements of the fast update method for views with connections and aggregate views . But this is unreal. We cannot simultaneously output the rowid of the attached row and count (*) to the result. For this reason, it is necessary to build a cascade of materialized representations. In one, data sets will be connected, in the other, aggregation.
First you need to create a materialized veiw log for the allocation table. For the zone table, the previously created log will be used.
create materialized view log
on location with rowid
,sequence
(loc,loc_type,status)
including new values;
Next we create join mivew. Unfortunately ANSI does not perceive syntax here, we will use old-style join.
create materialized view mv$location$zone$join
refresh fast on commit
as
select l.loc
,l.loc_type
,z.zone
,z.zone_type
,l.rowid l_rowid
,z.rowid z_rowid
from location l
,zone z
where z.loc(+) = l.loc
and l.status in ('A','I')
Create materialized veiw log to join view
create materialized view log
on mv$location$zone$join
with rowid
,sequence
(loc,loc_type,zone_type)
including new values;
Create an aggregated materialized view
create materialized view mv$location$zone$agg
refresh fast on commit
as
select loc
,loc_type
,count(decode(zone_type,'K',1)) K_cnt
,count(decode(zone_type,'S',1)) S_cnt
,count(decode(zone_type,'L',1)) L_cnt
,count(*) cnt
from mv$location$zone$join
group by loc,loc_type;
Well, the limitations themselves
alter table mv$location$zone$agg
add constraint wh_zones_chk
check(loc_type != 'W' or K_cnt > 0 and S_cnt = 0 and L_cnt > 0)
deferrable initially deferred;
alter table mv$location$zone$agg
add constraint store_zones_chk
check(loc_type != 'S' or K_cnt >= 0 and S_cnt > 0 and L_cnt > 0)
deferrable initially deferred;
Check the operation of the restrictions:
SQL> insert into location (loc,loc_type,status,loc_desc)
2 values (3,'S','W','Магазин 3 не определены обязательные зоны');
1 row inserted
SQL> commit;
Commit complete
The placement has been successfully created in the draft status. Let's try to activate it:
SQL> update location set status = 'A' where loc = 3;
1 row updated
SQL> commit;
ORA-02091: transaction rolled back
ORA-02290: check constraint (ZTXN.STORE_ZONES_CHK) violated
Not. You cannot activate a placement if the zones necessary for its type are not defined for it.
SQL> insert into zone(zone,loc,is_primary,zone_type) values (7,3,'Y','S');
1 row inserted
SQL> insert into zone(zone,loc,is_primary,zone_type) values (8,3,'Y','L');
1 row inserted
SQL> update location set status = 'A' where loc = 3;
1 row updated
Commit complete
Conclusion
When to use it
First of all, when the connection is so complicated that the risk is high that the application developer will not be able to take everything into account, and the mismatch is critical. I first used this approach when I designed a structure looped over by itself with foreign keys through seven tables. Moreover, these tables are maintained by different business units. This limitation still stands today. And to this day, users find loopholes, send screenshots when this restriction is triggered, and it is not possible to reproduce to close the loophole in the application module, a combination of circumstances from several users is required.It is very convenient to have such restrictions at the stage of testing, starting, stabilizing projects, when there is a lack of confidence in the correctness of the logic, and the data can be modified by the application link.
When not to use it
Obviously, you should not use this approach in cases where it gives a noticeable drawdown in performance.
In case of modification, refinement of the data scheme, the content of the views may not be reliable and will require a complete update. If the time for a complete update of a materialized view could jeopardize the implementation of technical work regulations, you probably should not use this approach either.