maxstroy December 8, 2014 at 15:46

Finding the Holy Grail of Business Intelligence

Sing that I see, or see that I sing?

The main task of a business analyst in the development of new software is to study the subject area and formally describe the information received in the form of a model (Domain Model). The analyst must sing what he sees and what he wants to see. To do this, he must have a language in which he will perform his song. However, the analyst is not always familiar with the appropriate language, and therefore often uses other languages. This is partly due to the fact that project management is carried out not from the point of view of the subject area, but from the point of view of implementation. And then misfortune can happen to the analyst: he can stop seeing what he needs to sing and begin to see only what there are words in the vocabulary of the language he uses. Everything else ceases to exist for him. Then, instead of singing what he sees, the analyst begins to see what he sings.

Table structure

What do you need to remember?

To study and describe the subject area, the analyst must know:

How do we think?
How do we structure the results of thinking? To do this, you need to know the formal language that is suitable for recording the results of our thinking FM

Since, to record the results, the analyst does not always have the opportunity to use FM , then the analyst must also know this:

Areas of knowledge in which other modeling languages exist. For example, using SQL, you can create tables. Some parts of this language may be reflected in the ER model. UML is designed to simulate programs written in object programming languages.
How to use a language created for other purposes within the framework of the task of modeling a domain? To do this, the analyst must be able to mock with FM into another language. That is, he must understand how the shifting of certain constructions written in the FM language occurs in the construction of another language, for example, UML.

Unresolved issues

There is a gap between FM and modeling languages that are used in programming.
At the same time, there are applicants for the title of FM , but not one of them has yet become universally recognized. While in the programs of universities there are no courses devoted to this topic. In the meantime, there is no topic, no questions.

Corollary:

It is generally accepted that UML and ER models can be used to model subject areas without restrictions. Therefore, analysts try to model the subject area in UML, or ER even when the restrictions imposed by the modeling language do not allow this to be done correctly.

For the same analysts who understand the problem voiced, the Holy Grail would be a tool that allows you to simultaneously simulate both the subject area and the implementation in program code. Therefore, scientists continue to move in a circle, creating new ontological standards, and, at the same time, creating programming languages that support these ontologies. But so far the gap is wide.

History tour

In Europe, Aristotle was the first to try to answer the question of how we structure the results of our thinking. He decided that our consciousness works like this: all the objects that we see, we attribute to certain types. A type is, according to Aristotle, a list of attributes that describe instances of this type. Each instance is represented in the model by the ordered values of these attributes. Aristotle's assumption was born on the basis that the record of the objects of our world was originally carried out in the form of data plates. Aristotle gave a picture, but he did not possess the knowledge that we possess now, and therefore his picture of types must be supplemented by another property. I will not announce this property yet, leaving it to you. Thus:

a table is a body of knowledge about the type and instances of that type
parameter string is a type
the title in the parameter bar is the name of the parameter
a string of values is an instance of this type
value in the value string - a specific attribute of this instance

If you see an empty table, then the type and its description are in front of you, you see the completed table, then, in addition to the type and its description, a description of specific instances of this type. Such a picture is given at the beginning of the article.

We also meet such tables:

What is shown in this table? The answer to the question can be given in two ways. And both will be true. I will let you think about the question: what does this table model? And what does the data in this table mean?

Two different meanings of the same statement.

As a result, Aristotle gave us the terms: the type of objects and an instance of the type of objects. As soon as you hear the term instance, it means that we are talking about types. For example, let there be a saying: “I am holding a copy of the book Three Musketeers.” It is interpreted as follows: there is a type of books "Three Musketeers", and there is a specific instance of this type of objects - a specific book. This statement can be shortened to: “I am holding the book The Three Musketeers.” This statement can be interpreted in two ways:

We can say that the object that I hold in my hands has a property. This property of the object is to be a book entitled Three Musketeers (Intensional Context)
We can say that we have before us an object (element) of the Three Musketeers book class. This statement speaks of an extensional context.

In Aristotle's view, we work only in an intensional context, where all objects have certain properties. This class of representations is fixed in the ontological standard MOF . Both ER models and OOP are built on this standard. Question: does this method of typing objects really give us an idea of how we think and how we structure our knowledge? In order to understand whether this is so, we need to conduct an experiment.

Employees and experiments

Let there be a group of employees in the laboratory of experimental physics, and a series of experiments conducted by laboratory employees. We ask ourselves a question: how to model this subject area in terms of Aristotelian logic? The first thing that comes to mind is for each employee to get the sign “Experiment”, the start date and the end date, which will indicate at what time and in what experiment the employee is busy. That is, in the logic of Aristotle, an experiment would become a sign of an employee. This means that the question "What kind of employee is this?" you can answer: "Busy in the experiment."

Exactly until the employee begins to work on two experiments at once. Then we will not be able to get the “Experiment” parameter, because the relations between employees and experiments immediately become many to many.
It is no longer possible to model this connection in terms of signs. You will have to create a new type of entity, “Connection,” which stores a link to an employee and an experiment, but does not mean anything by itself. Such objects simply do not exist in nature!

In addition, we got some leap in the model. Until some point, we managed with two types of entities, but at the moment when it turned out that employees can work on several experiments at once, it turned out that these entities are few. Our brain doesn't work like that. For him, it makes no difference to one or several employees working on several experiments. In the head, the model of being does not change from this. This means that this class of models has limitations.

Quantum leap in the model

Sometimes we turn our ideas about things upside down. For example, quantum mechanics has shown us a paradox in which we have to admit that the past can be changed. And then we change our picture of the world. When we find out that employees work in different projects, this does not change our picture of the world. However, this knowledge leads to a quantum leap in data modeling!

Perhaps OOP will give us the answer to the question of how to model the world? To do this, consider another example. Let's try to simulate apple trees, varieties of apple trees and their habitats.

Apple Trees and Areas

Suppose we have apple trees and it is necessary to model the area of their growth. Let me remind you that specific apple trees can not know anything about the area. They only know the coordinates of their place of growth. The range is determined on the class of apple trees and only on the class. Question: what object in the OOP model will store the value of the Areal parameter? OOP allows you to do this in many ways.

Ambiguity of model implementation

The fact that this is being implemented in different ways already tells us that the PLO does not model our vision of the world. Our vision is unambiguous. We have built this vision for centuries together with the whole European world, and we have come to certain models. These models are unique. Therefore, if there is a design methodology that leads us to different models, then this methodology is not suitable for us to model a subject area.

The first implementation option

You can create a static variable for the Apple Tree class. What does this variable mean? This variable is created for objects of this class, one at all. Question: is this a class variable, or class objects? Logically, we can conclude: if the value of the variable is accessible to the objects of the class, then this is a variable of the objects of the class, not the class of objects! That is, creating a static variable will not solve the problem of creating a class variable. Thus, we come to the second way of implementing the task.

Second implementation option

We create a class "Subclasses of apple trees", in which we declare the parameter "Area". The Apple Tree Class object, which is an object of the Apple Tree Subclasses class, will contain the value of the Areal parameter. Add the “Apple tree list” parameter for the “Apple tree subclasses” class. Then the object "Class of apple trees" will have a list of links to objects of the class "Apple trees". What can we do now, thanks to the new structure? We will create a new object of the “Apple tree subclasses” class called “Grushovka” and associate with this object a list of objects of the “Apple tree” class, which we should mark as pear trees. Thus, we can design a subclass of apple trees - a class of pears, and by adding the necessary operations on lists, we can operate on classes, and not just class objects. This will give us the opportunity to study the intersection of the habitats of different varieties of apple trees, as well as their association.

What's wrong?

Firstly, we manually created the “Apple Tree Class” object while the OOP postulated the thesis that we can work with object classes. However, OOP has a built-in mechanism for working with class objects, but not with a class. That is, when you declare a class, you are actually describing the objects of this class, not the class itself! There are standard operations on classes of objects: intersection, union. There are no built-in mechanisms for implementing such operations in OOP. To implement them, you need to simulate them manually. For example, in the case of apple trees, to simulate pears, we had to use an auxiliary class - “Apple tree subclasses”.
Secondly, in the UML language it is not possible to draw a connection between the object "Classes of apple trees" and objects of the class "Apple trees". This relationship is called classification. You will not find it in OOP models.
Thirdly, in UML you cannot create an object, and only then classify it. And also it is impossible to reclassify an object if the classification result did not suit us. This is the fundamental limitation of OOP from real domain modeling. This restriction is equivalent to the restriction of Aristotle's logic.

Was there a boy?

You are looking at the monitor now. True? In fact, you are looking nowhere. It is just that the mind interprets the signals received from the eyes, decrypts them, and compares them with those images that are stored in its memory. Having chosen the right one, he checks whether the received image contradicts the signals from other channels of perception. If the signals from different channels do not contradict each other, that is, they do not cause dissonance, then the mind gives a solution: we have an object that, by its characteristics, can be attributed to the class of monitors. But it can also happen that the brain will give a different solution, referring the object to the tomato class. And then the subject will poke a finger at the monitor with the words - tomato. But the object will not change its properties, perceptions of this object will change. Therefore, in nature there are no dogs, no cats, no monitors. There is a being

conclusions

We saw that modeling in the categories of OOP types or classes leads us to the fact that the constructed models are not expandable. That is, it is necessary to build the data structure in advance for growth, otherwise then it will not be possible to expand the functionality without breaking the existing functionality.
An analysis of the contradictions of Aristotelian logic led to the development of set theory. First, the primitive theory proposed by Cantor, and then modern. In one of the axiomatics of modern set theory, the term set is replaced by the term class to emphasize the difference between them. But these are not the classes that everyone is used to (OOP classes).
It turns out that the model of the world that we create is much richer than the one that we usually write on paper in the form of text or a table. A lot of effort was spent to understand how our consciousness works when creating a model of the existing, and how it should be written.

Let's see how set theory handles these tasks (To be continued) .

PS You might think that I am opposed to domain modeling in classical notations. No, I just want us to be able to think, to be able to convey these thoughts and reflect them in the model in the correct way.

Tags: