Materialized views as a means of monitoring data integrity

    Data integrity control is one of the most important DBMS functions. The more carefully this control is organized, the easier it is to implement applied logic, because the more restrictions are controlled by the database, the fewer variations of “what if” should be provided for when implementing the logic. At the same time, integrity monitoring is quite convenient to use to verify the correct operation of the application layer. Something like unit tests. An “extra” check can sometimes do a very good job.

    The traditional set of restrictions is the limitation of the primary, foreign keys, uniqueness when using normalization, which allows to satisfy the vast majority of cases of control needs. However, in the case when the restriction turns out to be dependent on the values ​​in several tables and rows, these means are not enough. Such restrictions have to be implemented by trigger logic. And implementation is far from always simple. The developer has to keep in mind that data modification can be carried out in a competitive environment, so you need to take care of blocking resources yourself, while still trying to avoid deadlocks. Implementing a row restriction may require access to other rows of the same table, which,

    But there is another way. In some cases, it becomes possible to use the restrictions imposed on materialized representations, updated upon the fact of committing transactions (fast refresh on commit). Such restrictions will work as deferred and will not allow you to commit a transaction if suddenly the data integrity is violated. Within the framework of a modifying transaction, restrictions may be violated. On the one hand, this simplifies data modification; on the other hand, it makes it difficult to identify the source of the error. In this article, I would like to give a couple of simple examples of the implementation of such restrictions.

    Formulation of the problem


    I would like to show the implementation of the approach using a fictitious simplified example. It turned out to be difficult enough to choose such an example so that it was simple enough to perceive, but at the same time, so that the application of the approach is justified, do not blame me if something suddenly happened wrong.

    Let us have the need to account for goods in the context of the placement zone. The placement in this case is the store (S) or warehouse (W).

    Zone - the physical or logical territory of each particular location. For example - a trading floor, or even shelves of a trading floor, a material room, a refrigerator, a zone of lost goods. Each location can have more than one zone of each type, but one zone of each type must be marked as primary. It will be used by default if the zone of operation is not explicitly defined. The main zone should be one and only one for each type of zone. This will be the first type of restriction that we will try to implement.

    The second type of restriction is the composition of the zones. In our example, we restrict ourselves to setting rules for three types of zones:
    • Storage zone (K) - the zone in which goods are stored; this zone is required for the warehouse, but can also be defined for the store.
    • Area of ​​the sales area (S) - required for the store and cannot be determined for the warehouse
    • The zone of the lost product (L) is the logical zone into which the product will be moved, the status of which is not clear, requires clarification. Mandatory for both warehouse and store

    Each placement record can be in three states - draft (W), active (A), inactive (I). When the record is in the “Draft” state, we give the user the greatest freedom of action and allow us to violate this restriction.

    Implementation


    tables

    	create table location (
    	  loc number primary key
    	  ,loc_type varchar2(1 char) check (loc_type in ('S','W'))
    	  ,status varchar2(1 char) not null check (status in ('W','A','I'))
    	  ,loc_desc varchar2(200 char)
    	);
    	create table zone(
    	  zone number primary key
    	  ,loc number references location(loc)
    	  ,is_pirmary varchar(1 char) not null check (is_pirmary in ('Y','N'))
    	  ,zone_type varchar2(1 char) not null 
    	  ,zone_desc varchar2(200 char)
    	);
    	insert into location values (1,'S','W','Магазин 1 без ошибок');
    	insert into zone values (1,1,'Y','K','Зона хранения магазина 1');
    	insert into zone values (2,1,'Y','S','Торговый зал магазина 1');
    	insert into zone values (3,1,'Y','L','Зона утерянного товара магазина 1');
    	commit;
    	


    Limit the number of main zones


    To implement this limitation, we will create a materialized representation that will calculate the main zones for each type of placement zone, and we will impose a constraint on top that controls strict equality to the unit of the calculated value. For queries on the basis of which materialized representations are built, a number of restrictions are defined , which is further tightened by imposing fast update requirements . In our case, we have an aggregated materialized view, and therefore we must create a materialized view log for the zone table, including rowid and new values, the list of fields of which should include all the values ​​that may affect the query result
    create materialized view log 
      on zone with rowid
                   ,sequence
                   (zone,loc,zone_type,is_primary) 
                   including new values
    

    We must also include in the result returned by the request the value “count (*)”
    	create materialized view mv$zoneloc_pimary$chk
    	  refresh fast on commit
    	as
    	select loc
    	       ,zone_type
    	       ,count(decode(is_primary,'Y',1)) primary_count 
    	       ,count(*) cnt
    	  from zone 
    	  group by loc,zone_type;
    	

    It should be noted here: in order to evaluate whether the materialized view built on demand can use the fast method for updating, the dbms_mivew.explain_mview procedure exists . It is highly advisable to use it to control whether the fast update method is available for submission. For example, if we forgot to specify count (*) in the request, a materialized view would be successfully created and would work correctly when performing the insert operation. However, upon modification, deletion, the value of primary_count would not be recounted, which would violate the logic of our restriction. However, if we use explain_mview, Oracle will help us tell us our miscalculation.
    	SQL> set serveroutput on
    	SQL> declare
    	  2    result SYS.ExplainMVArrayType;
    	  3  begin
    	  4    dbms_mview.explain_mview(mv => 'select loc
    	  5                                         ,zone_type
    	  6                                         ,count(decode(is_primary,''Y'',1)) primary_counnt
    	  7                                         --,count(*) cnt
    	  8                                    from zone
    	  9                                    group by loc,zone_type'
    	 10                             ,msg_array => result
    	 11                            );
    	 12    for i in 1..result.count
    	 13    loop
    	 14      dbms_output.put(rpad(result(i).capability_name,30,' '));
    	 15      dbms_output.put(' '||result(i).POSSIBLE);
    	 16      dbms_output.put(' '||result(i).MSGTXT);
    	 17      dbms_output.put_line(null);
    	 18    end loop;
    	 19  end;
    	 20  /
    	PCT                            F 
    	REFRESH_COMPLETE               T 
    	REFRESH_FAST                   T 
    	REWRITE                        T 
    	PCT_TABLE                      F relation is not a partitioned table
    	REFRESH_FAST_AFTER_INSERT      T 
    	REFRESH_FAST_AFTER_ONETAB_DML  F COUNT(*) is not present in the select list
    	REFRESH_FAST_AFTER_ANY_DML     F see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled
    	REFRESH_FAST_PCT               F PCT is not possible on any of the detail tables in the materialized view
    	REWRITE_FULL_TEXT_MATCH        T 
    	REWRITE_PARTIAL_TEXT_MATCH     T 
    	REWRITE_GENERAL                T 
    	REWRITE_PCT                    F general rewrite is not possible or PCT is not possible on any of the detail tables
    	PCT_TABLE_REWRITE              F relation is not a partitioned table
    	PL/SQL procedure successfully completed
    	 

    So, the materialized view is created, it remains only to add a restriction
    	alter table mv$zoneloc_pimary$chk 
    	   add constraint zone_loc_primary$chk 
    	         check (primary_count=1) 
    	   deferrable initially deferred;
    	

    Please note that the restriction is created deferred. The fact is that in the process of updating the presentation by the Oracle, at some intermediate stage, it may turn out that the restriction will be temporarily violated. To avoid such false positives, it is better to set such restrictions deliberately delayed.

    Check the operation of this restriction
    	SQL> insert into location values (2,'S','W','Магазин 2 нарушения ограниченя основной зоны ');
    	 1 row inserted
    	 SQL> commit;
    	 Commit complete
    	

    Let's try to create a zone with a type that does not have a “main” mark.
    	SQL> insert into zone values (4,2,'N','S','Основной торговый зал не определен');
    	1 row inserted
    	SQL> commit;
    	ORA-02091: transaction rolled back
    	ORA-02290: check constraint (ZTXN.ZONE_LOC_PRIMARY$CHK) violated
    	

    We will try to identify two main zones for placement with the same type.
    	SQL> insert into zone values (5,2,'Y','L','Первая основная зона утерянного товара');
    	1 row inserted
    	SQL> commit;
    	Commit complete
    	SQL> insert into zone values (6,2,'Y','L','Вторая основная зона утерянного товара');
    	1 row inserted
    	SQL> commit;
    	ORA-02091: transaction rolled back
    	ORA-02290: check constraint (ZTXN.ZONE_LOC_PRIMARY$CHK) violated
    	

    Restriction of the composition of the accommodation areas


    This restriction differs from the previous one in that it relies on the values ​​of not one table, but two. Those. at the same time, you must satisfy the requirements of the fast update method for views with connections and aggregate views . But this is unreal. We cannot simultaneously output the rowid of the attached row and count (*) to the result. For this reason, it is necessary to build a cascade of materialized representations. In one, data sets will be connected, in the other, aggregation.

    First you need to create a materialized veiw log for the allocation table. For the zone table, the previously created log will be used.
    create materialized view log 
      on location with rowid
                       ,sequence
                       (loc,loc_type,status) 
                       including new values;
    

    Next we create join mivew. Unfortunately ANSI does not perceive syntax here, we will use old-style join.
    	create materialized view mv$location$zone$join 
    	  refresh fast on commit
    	as
    	select l.loc
    	       ,l.loc_type
    	       ,z.zone
    	       ,z.zone_type
    	       ,l.rowid l_rowid
    	       ,z.rowid z_rowid
    	 from location l
    	      ,zone z 
    	 where z.loc(+) = l.loc
    	       and l.status in ('A','I')
    	

    Create materialized veiw log to join view
    create materialized view log 
      on mv$location$zone$join 
        with rowid
             ,sequence
             (loc,loc_type,zone_type) 
             including new values;
    

    Create an aggregated materialized view
    	create materialized view mv$location$zone$agg
    	  refresh fast on commit
    	as
    	select loc
    	       ,loc_type
    	       ,count(decode(zone_type,'K',1)) K_cnt
    	       ,count(decode(zone_type,'S',1)) S_cnt
    	       ,count(decode(zone_type,'L',1)) L_cnt
    	       ,count(*) cnt
    	from mv$location$zone$join
    	group by loc,loc_type;
    	

    Well, the limitations themselves
    	alter table mv$location$zone$agg 
    	  add constraint wh_zones_chk 
    	     check(loc_type != 'W' or K_cnt > 0 and S_cnt = 0 and L_cnt > 0) 
    	  deferrable initially deferred;
    	alter table mv$location$zone$agg 
    	  add constraint store_zones_chk 
    	     check(loc_type != 'S' or K_cnt >= 0 and S_cnt > 0 and L_cnt > 0)
    	  deferrable initially deferred;
    	

    Check the operation of the restrictions:

    	SQL> insert into location (loc,loc_type,status,loc_desc)
    	  2         values (3,'S','W','Магазин 3 не определены обязательные зоны');
    	1 row inserted
    	SQL> commit;
    	Commit complete
    	

    The placement has been successfully created in the draft status. Let's try to activate it:
    	SQL> update location set status = 'A' where loc = 3;
    	1 row updated
    	SQL> commit;
    	ORA-02091: transaction rolled back
    	ORA-02290: check constraint (ZTXN.STORE_ZONES_CHK) violated
    	

    Not. You cannot activate a placement if the zones necessary for its type are not defined for it.
    	SQL> insert into zone(zone,loc,is_primary,zone_type) values (7,3,'Y','S');
    	1 row inserted
    	SQL> insert into zone(zone,loc,is_primary,zone_type) values (8,3,'Y','L');
    	1 row inserted
    	SQL> update location set status = 'A' where loc = 3;
    	1 row updated
    	Commit complete
    	


    Conclusion


    When to use it
    First of all, when the connection is so complicated that the risk is high that the application developer will not be able to take everything into account, and the mismatch is critical. I first used this approach when I designed a structure looped over by itself with foreign keys through seven tables. Moreover, these tables are maintained by different business units. This limitation still stands today. And to this day, users find loopholes, send screenshots when this restriction is triggered, and it is not possible to reproduce to close the loophole in the application module, a combination of circumstances from several users is required.

    It is very convenient to have such restrictions at the stage of testing, starting, stabilizing projects, when there is a lack of confidence in the correctness of the logic, and the data can be modified by the application link.

    When not to use it

    Obviously, you should not use this approach in cases where it gives a noticeable drawdown in performance.

    In case of modification, refinement of the data scheme, the content of the views may not be reliable and will require a complete update. If the time for a complete update of a materialized view could jeopardize the implementation of technical work regulations, you probably should not use this approach either.

    Also popular now: