ztxn November 16, 2011 at 17:45

Materialized views as a means of monitoring data integrity

Data integrity control is one of the most important DBMS functions. The more carefully this control is organized, the easier it is to implement applied logic, because the more restrictions are controlled by the database, the fewer variations of “what if” should be provided for when implementing the logic. At the same time, integrity monitoring is quite convenient to use to verify the correct operation of the application layer. Something like unit tests. An “extra” check can sometimes do a very good job.

The traditional set of restrictions is the limitation of the primary, foreign keys, uniqueness when using normalization, which allows to satisfy the vast majority of cases of control needs. However, in the case when the restriction turns out to be dependent on the values in several tables and rows, these means are not enough. Such restrictions have to be implemented by trigger logic. And implementation is far from always simple. The developer has to keep in mind that data modification can be carried out in a competitive environment, so you need to take care of blocking resources yourself, while still trying to avoid deadlocks. Implementing a row restriction may require access to other rows of the same table, which,

But there is another way. In some cases, it becomes possible to use the restrictions imposed on materialized representations, updated upon the fact of committing transactions (fast refresh on commit). Such restrictions will work as deferred and will not allow you to commit a transaction if suddenly the data integrity is violated. Within the framework of a modifying transaction, restrictions may be violated. On the one hand, this simplifies data modification; on the other hand, it makes it difficult to identify the source of the error. In this article, I would like to give a couple of simple examples of the implementation of such restrictions.

Formulation of the problem

I would like to show the implementation of the approach using a fictitious simplified example. It turned out to be difficult enough to choose such an example so that it was simple enough to perceive, but at the same time, so that the application of the approach is justified, do not blame me if something suddenly happened wrong.

Let us have the need to account for goods in the context of the placement zone. The placement in this case is the store (S) or warehouse (W).

Zone - the physical or logical territory of each particular location. For example - a trading floor, or even shelves of a trading floor, a material room, a refrigerator, a zone of lost goods. Each location can have more than one zone of each type, but one zone of each type must be marked as primary. It will be used by default if the zone of operation is not explicitly defined. The main zone should be one and only one for each type of zone. This will be the first type of restriction that we will try to implement.

The second type of restriction is the composition of the zones. In our example, we restrict ourselves to setting rules for three types of zones:

Storage zone (K) - the zone in which goods are stored; this zone is required for the warehouse, but can also be defined for the store.
Area of the sales area (S) - required for the store and cannot be determined for the warehouse
The zone of the lost product (L) is the logical zone into which the product will be moved, the status of which is not clear, requires clarification. Mandatory for both warehouse and store

Each placement record can be in three states - draft (W), active (A), inactive (I). When the record is in the “Draft” state, we give the user the greatest freedom of action and allow us to violate this restriction.

Implementation

tables

	create table location (
	  loc number primary key
	  ,loc_type varchar2(1 char) check (loc_type in ('S','W'))
	  ,status varchar2(1 char) not null check (status in ('W','A','I'))
	  ,loc_desc varchar2(200 char)
	);
	create table zone(
	  zone number primary key
	  ,loc number references location(loc)
	  ,is_pirmary varchar(1 char) not null check (is_pirmary in ('Y','N'))
	  ,zone_type varchar2(1 char) not null 
	  ,zone_desc varchar2(200 char)
	);
	insert into location values (1,'S','W','Магазин 1 без ошибок');
	insert into zone values (1,1,'Y','K','Зона хранения магазина 1');
	insert into zone values (2,1,'Y','S','Торговый зал магазина 1');
	insert into zone values (3,1,'Y','L','Зона утерянного товара магазина 1');
	commit;

Limit the number of main zones

To implement this limitation, we will create a materialized representation that will calculate the main zones for each type of placement zone, and we will impose a constraint on top that controls strict equality to the unit of the calculated value. For queries on the basis of which materialized representations are built, a number of restrictions are defined , which is further tightened by imposing fast update requirements . In our case, we have an aggregated materialized view, and therefore we must create a materialized view log for the zone table, including rowid and new values, the list of fields of which should include all the values that may affect the query result

create materialized view log 
  on zone with rowid
               ,sequence
               (zone,loc,zone_type,is_primary) 
               including new values

We must also include in the result returned by the request the value “count (*)”

	create materialized view mv$zoneloc_pimary$chk
	  refresh fast on commit
	as
	select loc
	       ,zone_type
	       ,count(decode(is_primary,'Y',1)) primary_count 
	       ,count(*) cnt
	  from zone 
	  group by loc,zone_type;

It should be noted here: in order to evaluate whether the materialized view built on demand can use the fast method for updating, the dbms_mivew.explain_mview procedure exists . It is highly advisable to use it to control whether the fast update method is available for submission. For example, if we forgot to specify count (*) in the request, a materialized view would be successfully created and would work correctly when performing the insert operation. However, upon modification, deletion, the value of primary_count would not be recounted, which would violate the logic of our restriction. However, if we use explain_mview, Oracle will help us tell us our miscalculation.

	SQL> set serveroutput on
	SQL> declare
	  2    result SYS.ExplainMVArrayType;
	  3  begin
	  4    dbms_mview.explain_mview(mv => 'select loc
	  5                                         ,zone_type
	  6                                         ,count(decode(is_primary,''Y'',1)) primary_counnt
	  7                                         --,count(*) cnt
	  8                                    from zone
	  9                                    group by loc,zone_type'
	 10                             ,msg_array => result
	 11                            );
	 12    for i in 1..result.count
	 13    loop
	 14      dbms_output.put(rpad(result(i).capability_name,30,' '));
	 15      dbms_output.put(' '||result(i).POSSIBLE);
	 16      dbms_output.put(' '||result(i).MSGTXT);
	 17      dbms_output.put_line(null);
	 18    end loop;
	 19  end;
	 20  /
	PCT                            F 
	REFRESH_COMPLETE               T 
	REFRESH_FAST                   T 
	REWRITE                        T 
	PCT_TABLE                      F relation is not a partitioned table
	REFRESH_FAST_AFTER_INSERT      T 
	REFRESH_FAST_AFTER_ONETAB_DML  F COUNT(*) is not present in the select list
	REFRESH_FAST_AFTER_ANY_DML     F see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
	REFRESH_FAST_PCT               F PCT is not possible on any of the detail tables in the materialized view
	REWRITE_FULL_TEXT_MATCH        T 
	REWRITE_PARTIAL_TEXT_MATCH     T 
	REWRITE_GENERAL                T 
	REWRITE_PCT                    F general rewrite is not possible or PCT is not possible on any of the detail tables
	PCT_TABLE_REWRITE              F relation is not a partitioned table
	PL/SQL procedure successfully completed

So, the materialized view is created, it remains only to add a restriction

	alter table mv$zoneloc_pimary$chk 
	   add constraint zone_loc_primary$chk 
	         check (primary_count=1) 
	   deferrable initially deferred;

Please note that the restriction is created deferred. The fact is that in the process of updating the presentation by the Oracle, at some intermediate stage, it may turn out that the restriction will be temporarily violated. To avoid such false positives, it is better to set such restrictions deliberately delayed.

Check the operation of this restriction

	SQL> insert into location values (2,'S','W','Магазин 2 нарушения ограниченя основной зоны ');
	 1 row inserted
	 SQL> commit;
	 Commit complete

Let's try to create a zone with a type that does not have a “main” mark.

	SQL> insert into zone values (4,2,'N','S','Основной торговый зал не определен');
	1 row inserted
	SQL> commit;
	ORA-02091: transaction rolled back
	ORA-02290: check constraint (ZTXN.ZONE_LOC_PRIMARY$CHK) violated

We will try to identify two main zones for placement with the same type.

	SQL> insert into zone values (5,2,'Y','L','Первая основная зона утерянного товара');
	1 row inserted
	SQL> commit;
	Commit complete
	SQL> insert into zone values (6,2,'Y','L','Вторая основная зона утерянного товара');
	1 row inserted
	SQL> commit;
	ORA-02091: transaction rolled back
	ORA-02290: check constraint (ZTXN.ZONE_LOC_PRIMARY$CHK) violated

Restriction of the composition of the accommodation areas

This restriction differs from the previous one in that it relies on the values of not one table, but two. Those. at the same time, you must satisfy the requirements of the fast update method for views with connections and aggregate views . But this is unreal. We cannot simultaneously output the rowid of the attached row and count (*) to the result. For this reason, it is necessary to build a cascade of materialized representations. In one, data sets will be connected, in the other, aggregation.

First you need to create a materialized veiw log for the allocation table. For the zone table, the previously created log will be used.

create materialized view log 
  on location with rowid
                   ,sequence
                   (loc,loc_type,status) 
                   including new values;

Next we create join mivew. Unfortunately ANSI does not perceive syntax here, we will use old-style join.

	create materialized view mv$location$zone$join 
	  refresh fast on commit
	as
	select l.loc
	       ,l.loc_type
	       ,z.zone
	       ,z.zone_type
	       ,l.rowid l_rowid
	       ,z.rowid z_rowid
	 from location l
	      ,zone z 
	 where z.loc(+) = l.loc
	       and l.status in ('A','I')

Create materialized veiw log to join view

create materialized view log 
  on mv$location$zone$join 
    with rowid
         ,sequence
         (loc,loc_type,zone_type) 
         including new values;

Create an aggregated materialized view

	create materialized view mv$location$zone$agg
	  refresh fast on commit
	as
	select loc
	       ,loc_type
	       ,count(decode(zone_type,'K',1)) K_cnt
	       ,count(decode(zone_type,'S',1)) S_cnt
	       ,count(decode(zone_type,'L',1)) L_cnt
	       ,count(*) cnt
	from mv$location$zone$join
	group by loc,loc_type;

Well, the limitations themselves

	alter table mv$location$zone$agg 
	  add constraint wh_zones_chk 
	     check(loc_type != 'W' or K_cnt > 0 and S_cnt = 0 and L_cnt > 0) 
	  deferrable initially deferred;
	alter table mv$location$zone$agg 
	  add constraint store_zones_chk 
	     check(loc_type != 'S' or K_cnt >= 0 and S_cnt > 0 and L_cnt > 0)
	  deferrable initially deferred;

Check the operation of the restrictions:

	SQL> insert into location (loc,loc_type,status,loc_desc)
	  2         values (3,'S','W','Магазин 3 не определены обязательные зоны');
	1 row inserted
	SQL> commit;
	Commit complete

The placement has been successfully created in the draft status. Let's try to activate it:

	SQL> update location set status = 'A' where loc = 3;
	1 row updated
	SQL> commit;
	ORA-02091: transaction rolled back
	ORA-02290: check constraint (ZTXN.STORE_ZONES_CHK) violated

Not. You cannot activate a placement if the zones necessary for its type are not defined for it.

	SQL> insert into zone(zone,loc,is_primary,zone_type) values (7,3,'Y','S');
	1 row inserted
	SQL> insert into zone(zone,loc,is_primary,zone_type) values (8,3,'Y','L');
	1 row inserted
	SQL> update location set status = 'A' where loc = 3;
	1 row updated
	Commit complete

Conclusion

When to use it

First of all, when the connection is so complicated that the risk is high that the application developer will not be able to take everything into account, and the mismatch is critical. I first used this approach when I designed a structure looped over by itself with foreign keys through seven tables. Moreover, these tables are maintained by different business units. This limitation still stands today. And to this day, users find loopholes, send screenshots when this restriction is triggered, and it is not possible to reproduce to close the loophole in the application module, a combination of circumstances from several users is required.

It is very convenient to have such restrictions at the stage of testing, starting, stabilizing projects, when there is a lack of confidence in the correctness of the logic, and the data can be modified by the application link.

When not to use it

Obviously, you should not use this approach in cases where it gives a noticeable drawdown in performance.

In case of modification, refinement of the data scheme, the content of the views may not be reliable and will require a complete update. If the time for a complete update of a materialized view could jeopardize the implementation of technical work regulations, you probably should not use this approach either.

Tags: