Normalization of relations. Six normal forms

In this topic I will touch on 6 normal forms and methods for bringing tables to these forms.

The process of designing a database using the NF method is iterative and consists in sequentially transferring a relationship from 1NF to a higher order NF according to certain rules. Each subsequent NF is limited to a certain type of functional dependencies and the elimination of the corresponding anomalies when performing operations on database relationships, as well as maintaining the properties of previous NFs.

Terms used


An attribute is a property of some entity. Often called a table field.

An attribute domain is the set of valid values ​​that an attribute can take.

A tuple is a finite set of interconnected valid attribute values ​​that together describe a certain entity (table row).

A relation is a finite set of tuples (table).

A relationship scheme is a finite set of attributes that define an entity. In other words, this is the structure of a table consisting of a specific set of fields.

Projection - a relation obtained from a given by deleting and (or) rearranging some attributes.

Functional dependencebetween the attributes (attribute sets) X and Y means that for any valid set of tuples in this respect: if two tuples match by the value of X, then they match by the value of Y. For example, if the value of the attribute “Company Name” is Canonical Ltd, then the Headquarters attribute value in such a tuple will always be Millbank Tower, London, United Kingdom. Designation: {X} -> {Y}.

Normal form is a requirement for the structure of tables in the theory of relational databases to eliminate redundant functional dependencies between attributes (table fields) from the database.

Normal Form Method (NF)consists in collecting information about the objects of solving the problem within the framework of one relationship and the subsequent decomposition of this relationship into several interrelated relationships based on the normalization of relations.

The purpose of normalization : to eliminate excessive data duplication, which is the cause of the anomalies that occurred when adding, editing, and deleting tuples (table rows).

An anomaly is such a situation in the database table, which leads to a contradiction in the database or significantly complicates the processing of the database. The reason is excessive duplication of data in the table, which is caused by the presence of functional dependencies on non-key attributes.

Modification Anomaliesmanifested in the fact that a change in some data may entail a review of the entire table and a corresponding change in some records in the table.

Deletion anomalies - when a tuple is deleted from the table, information that is not directly related to the deleted record may disappear.

Add-on anomalies occur when information cannot be placed in a table until it is complete, or inserting a record requires additional viewing of the table.

First normal form


The relation is in 1NF, if all its attributes are simple, all domains used should contain only scalar values. There should be no duplicate rows in the table.

For example, there is a table “Cars”:

FirmModels
BMWM5, X5M, M1
NissanGT-R

Violation of the normalization of 1NF occurs in BMW models, as one cell contains a list of 3 elements: M5, X5M, M1, i.e. it is not atomic. We transform the table to 1NF:
FirmModels
BMWM5
BMWX5M
BMWM1
NissanGT-R

Second normal form


A relation is in 2NF if it is in 1NF and each non-key attribute irreducibly depends on the Primary Key (PC).

Irreducibility means that the potential key does not contain a smaller subset of attributes, from which this functional dependence can also be derived.

For example, given a table:

ModelFirmPriceA discount
M5BMW5,500,0005%
X5MBMW60000005%
M1BMW2,500,0005%
GT-RNissan5,000,00010%

The table is in the first normal form, but not in the second. The price of a car depends on the model and company. The discount depends on the company, that is, the dependence on the primary key is incomplete. This is corrected by decomposition into two relations in which non-key attributes depend on the PC.

ModelFirmPrice
M5BMW5,500,000
X5MBMW6000000
M1BMW2,500,000
GT-RNissan5,000,000

FirmA discount
BMW5%
Nissan10%


Third normal form


The relation is in 3NF when it is in 2NF and each non-key attribute is non-transitively dependent on the primary key. Simply put, the second rule requires the removal of all non-key fields, the contents of which can refer to several table entries in separate tables.

Consider the table:

ModelScoreTelephone
BMWReal auto87-33-98
AudiReal auto87-33-98
NissanNext Auto94-54-12


The table is in 2NF, but not in 3NF.
In relation to the attribute "Model" is the primary key. Cars do not have personal phones, and the phone depends entirely on the store.
Thus, the following functional dependencies exist in relation: Model → Shop, Shop → Phone, Model → Phone.
Dependence Model → Phone is transitive, therefore, the relation is not in 3NF.
As a result of the separation of the initial relationship, two relations are obtained that are in 3NF:

ScoreTelephone
Real auto87-33-98
Next Auto94-54-12


ModelScore
BMWReal auto
AudiReal auto
NissanNext Auto


The normal form of Boyce-Codd (NBFK) (a particular form of the third normal form)


The definition of 3NF is not entirely suitable for the following relationships:
1) the relation has two or more potential keys;
2) two or more potential keys are composite;
3) they intersect, i.e. have at least one attribute.

For relationships that have one potential key (primary), the NBFI is 3NF.

The relation is in the NBFK, when each non-trivial and left-irreducible functional dependence has a potential key as a determinant.

Suppose we consider a relationship that represents data on a daily parking reservation:

Parking numberStart timeEnd timeRate
109:3010:30Thrifty
111:0012:00Thrifty
12 p.m.15:30Standard
210 a.m.12:00Premium B
212:002 p.m.Premium B
23 p.m.18:00Premium A

The tariff has a unique name and depends on the chosen parking lot and the availability of benefits, in particular:
  • Lean: parking 1 for beneficiaries
  • “Standard”: parking 1 for non-beneficiaries
  • Premium-A: parking 2 for beneficiaries
  • Premium B: parking 2 for non-beneficiaries.

Thus, the following composite primary keys are possible: {Parking number, Start time}, {Parking number, End time}, {Tariff, Start time}, {Tariff, End time}.

The ratio is in 3NF. The requirements of the second normal form are satisfied, since all attributes are included in one of the potential keys, and there are no non-key attributes in relation. There are also no transitive dependencies, which corresponds to the requirements of the third normal form. Nevertheless, there is a functional dependence Tariff → Parking number in which the left part (determinant) is not a potential key of the relation, that is, the relation is not in the normal Boyce – Codd form.

The disadvantage of this structure is that, for example, you can mistakenly ascribe the “Lean” tariff to booking a second parking lot, although it can only apply to the first parking lot.

You can improve the structure by decomposing the relationship into two and adding the attribute. Has privileges by obtaining relations satisfying the NBFIs (the attributes included in the primary key are emphasized.):

Tariffs
RateParking numberHas perks
Thrifty1Yes
Standard1Not
Premium A2Yes
Premium B2Not

Reservation
RateStart timeEnd time
Thrifty09:3010:30
Thrifty11:0012:00
Standard2 p.m.15:30
Premium B10 a.m.12:00
Premium B12:002 p.m.
Premium A3 p.m.18:00

Fourth normal form


A relation is in 4NF if it is in the NBCH and all non-trivial multi-valued dependencies are actually functional dependencies on its potential keys.

Regarding R (A, B, C), there is a multi-valued dependence RA -> -> RB if and only if the set of values ​​of B corresponding to the pair of values ​​of A and C depends only on A and does not depend on C.

Suppose that restaurants produce different types of pizza, and restaurant delivery services work only in certain areas of the city. The composite primary key of the corresponding relationship variable includes three attributes: {Restaurant, Pizza view, Delivery area}.

This attitude variable does not correspond to 4NF, as there is the following multi-valued relationship:
{Restaurant} → {Pizza view}
{Restaurant} → {Delivery area}

That is, for example, when adding a new type of pizza, you will have to add one new tuple for each delivery area. A logical anomaly is possible in which only certain delivery areas from restaurant-served areas will correspond to a certain type of pizza.

To prevent anomalies, you need to decompose the relationship by placing independent facts in different ways. In this example, decompose into {Restaurant, Pizza View} and {Restaurant, Delivery Area}.

However, if we add an attribute functionally dependent on the potential key to the original relationship variable, for example, the price taking into account the cost of delivery ({Restaurant, Pizza Type, Delivery Area} → Price), then the resulting ratio will be in 4NF and it can no longer be decomposed without losses.

Fifth normal form


Relations are in 5NF if it is in 4NF and there are no complex dependent connections between attributes.
If “Attribute_1” depends on “Attribute_2”, and “Attribute_2” in turn depends on “Attribute_3”, and “Attribute_3” depends on “Attribute_1”, then all three attributes are necessarily included in one tuple.

This is a very strict requirement, which can be fulfilled only under additional conditions. In practice, it is difficult to find an example of the implementation of this requirement in its pure form.

For example, some table contains three attributes: “Supplier”, “Product” and “Buyer”. Buyer_1 purchases several Products from Supplier_1. Buyer_1 purchased a new Product from Supplier_2. Then, by virtue of the above requirements, the Provider_1 is obliged to deliver to the Buyer_1 the same new Goods, and the Provider_2 must supply the Buyer_1, except for the new Goods, the entire range of Products of the Provider_1. This does not happen in practice. The buyer is free in his choice of goods. Therefore, to eliminate the noted difficulty, all three attributes are distributed in different relations (tables). After highlighting three new relationships (Supplier, Product and Buyer), it must be remembered that when extracting information (for example, about buyers and products), it is necessary to combine all three relationships in the request. Any combination of combining two of the three relationships will inevitably lead to the extraction of incorrect (incorrect) information. Some DBMSs are equipped with special mechanisms that eliminate the extraction of false information. Nevertheless, one should adhere to the general recommendation: to build the database structure in such a way as to avoid the use of 4NF and 5NF.

The fifth normal form is oriented to work with dependent connections. The indicated dependent connections between the three attributes are very rare. Dependent connections between four, five or more attributes are almost impossible to specify.

Domain-key normal form


A relationship variable is in DKNF if and only if each restriction imposed on it is a logical consequence of domain restrictions and key restrictions imposed on this relation variable.
Domain restriction is a restriction that prescribes to use values ​​from a certain specified domain for a particular attribute. The restriction is essentially the specification of a list (or the logical equivalent of a list) of valid type values ​​and the declaration that the specified attribute is of the given type.

Key Constraint - A constraint that states that an attribute or combination of attributes is a potential key.

Any relationship variable located in DKNF is necessarily in 5NF. However, not every variable in the relationship can lead to DKNF.

Sixth normal form


A relationship variable is in its sixth normal form if and only if it satisfies all non-trivial dependencies of the connection. It follows from the definition that a variable is in 6NF if and only if it is irreducible, that is, it cannot be further decomposed without loss. Each ratio variable that is in 6NF is also in 5NF.

The idea of ​​“decomposition to the end” was put forward before the start of research in the field of historical data, but did not find support. However, for chronological databases, the maximum possible decomposition allows you to deal with redundancy and simplifies maintaining the integrity of the database.

For chronological databases, U_operators are defined that unpack the relationships by the specified attributes, perform the corresponding operation, and package the result. In this example, the connection of the projections of the relationship should be done using the operator U_JOIN.

Workers
Tab.№TimePositionHome address
657501-01-2000: 10-02-2003locksmithLenin St., 10
657502/11/2003: 06/15/2006locksmithSovetskaya St., 22
65752006-06-16: 05-03-2009foremanSovetskaya St., 22

The variable relationship “Employees” is not located in 6NF and can be decomposed into variable relations “Positions of employees” and “Home addresses of employees”.

Employee Positions
Tab.№TimePosition
657501-01-2000: 10-02-2003locksmith
65752006-06-16: 05-03-2009foreman

Workers home addresses
Tab.№TimeHome address
657501-01-2000: 10-02-2003Lenin St., 10
657502/11/2003: 06/15/2006Sovetskaya St., 22

Literature


For a deeper and more thorough study of the topic, the book "Introduction to Database Systems" by Chris J. Data is recommended, on the basis of which this article was written.

Also popular now: