
Formalization of speech. Some considerations

What is the main difficulty in formalizing a natural language? The fact is that we are accustomed to formalize it with the help of language, which leads to evil infinity. Language is in itself a means of formalization, which mankind has long and unsuccessfully used.
We take the first definition that
comes across: Flight - independent movement of an object in a gaseous medium or vacuum.
It has six terms, which in turn require definition:
- independent,
- moving
- an object,
- gaseous,
- Wednesday,
- vacuum.
Each of the terms has its own definitions, through which we obtain new definitions that require a new definition, etc. In the end, it turns out that all the terms used are given the definitions earlier, that is, we got a cycle. What is not the subject of our dreams, of course. One of the transactions needs to be stopped, but ... When to stop? what to choose for the stopping criterion? - these are the damned questions.
Recall why we generally need a language. To correctly reflect the reality surrounding us in the process of communication. The surrounding reality, by the way, consists of physical objects. We are not able to perceive anything else: physical objects - this is what is initially perceived by man.
From this we can conclude that stopping in the process of issuing definitions is necessary in terms denoting physical objects. The principle is this: what we see, we inform about that.
A logical trap lies in wait here: to understand what we are trying to communicate, it is necessary to define the term.
Suppose someone, pointing a finger, exclaims in amazement:
- Hare!
A “hare” requires either subject knowledge or definition — we find ourselves in the same unenviable position. But the unenviable position will disappear if you take something elementary - for example, shout:
- Something is white!
What is white? Not an object, of course, but its characteristic is color. A term that does not require a definition: white it is white. The number of colors available to the human eye is limited - accordingly, the number of terms that do not require a definition is also limited.
It is believed that a person has five senses (sometimes called more, but this is not important):
- vision
- touch
- sense of smell
- hearing
- tasteful.
The result of the functioning of each of the organs is a certain sensation, the values of which do not require definitions, since they correspond to the undetectable elements of the surrounding reality.
The question is, what elements of the surrounding reality can be characterized as a simple set of five sensations? Elementary objects! The idea is to decompose a complex object - of the same "hare" into elementary components, each of which is characterized by sensations. Putting the components back, we get the “hare” in the collection: an object that has a formal and, most importantly, completed verbal definition.
Let's see how this is possible.
Here is a physical object. Please note that to observe (more precisely, to feel, because an object can be perceived not only visually, but also with the help of other senses), an exclusively specific, that is, an individualized object is available. When I see a hare, this is a very specific hare - this one, and not any hare at all.
As a rule, individualization of an object occurs through its naming, but, as in the case of a hare, it is far from always (only if there is a need for it). Thus, depending on the situation, the term “hare” can mean:
- a particular hare
- name of the class to which any hare belongs.
These nuances must be distinguished - suppose that by means of abbreviations from “individ” (for designation we take the first letter) and “class” (for designation we take the last letter, because “c” in brackets is associated with the copyright sign):
Hare (i) .
Textual analogue: hare - personal name;
Hare (s).
Textual analogue: hare - the name of the class;
If the hare had a unique name, it would become more obvious:
Stepashka (i).
"Stepashka" cannot be the name of a class, but it requires an indication of belonging to a particular class. Do you call anyone so ?! We have to indicate belonging to the class. We use the symbol "∈" for this:
Stepashka (i) ∈ hare (s).
Now it is determined that Stepashka is one of the hares, but it is not determined what the hares are. As mentioned earlier, the "hare" must be decomposed into its component parts, each of which must be given characteristics that correspond to the sensations perceived by the person.
This is very difficult, mainly due to the three-dimensionality of the component parts of the object, therefore it can only be done conditionally. But in principle it can.
Suppose that a hare consists of a head, trunk, legs and tail, and that the objects listed are elementary (actually not, of course). Then, using the symbol "⊂" to indicate the occurrence of the component in the material whole, we get:
head (s) && trunk (s) && 4 * paw (s) && tail (s) ⊂ hare (s).
Textual analogue: head, and torso, and 4 legs, and tail make up a hare.
Since the objects are supposed to be elementary, for them you can specify the characteristics for each of the sensations. Due to the combined effect of sensations on a person, definitions in space and time may be required.
We get an approximate set of characteristics:
• color,
• shape,
• smell,
• taste,
• surface (touch result),
• sound,
• location (spatial coordinate),
• displacement (as the difference between the two locations),
• time,
• duration (as the difference between two points in time),
• speed (as the quotient between movement and duration).
The set, as I said, is approximate: only characteristics corresponding to sensations are unconditional, the rest is discussed. For example, it is clear that a person does not perceive time as such: it can be determined by the symbols on the gadget or by the position of the sun in the sky, but not directly by sensation. Similarly, the location is not set absolutely, but relative to other objects.
Now I’ll try to characterize the “head”:
- shape: round
- surface: hard.
Other characteristics are not defined.
That is, the head, if we conditionally consider it as an elementary object, is something round and solid. Conditionally, of course, exclusively conditionally. After all, language as a means of formalization gives approximate results: how, for example, to verbally describe a spot of complex geometric shape? No way: you just can’t describe. Therefore, in the conditional example, the head is approximately round and approximately solid - and a point on this.
If you agree, we will write it in braces:
head (s) {shape: round; surface: hard}.
That is, the specified object has the specified characteristics.
Of course, of course, the heads can be not only round, but also different: for example, Vovochka from a bearded anecdote of the Soviet period has a square head. Nothing prevents us from introducing logical operands into our notation, in particular the “or” operand:
head (s) {shape: round II square; surface: hard}.
But the hare’s head is round, not square, like Vova’s! Well, to hell with both of them, we introduce the implication:
head (s) {shape: round} if head (s) ⊂ hare (s).
Instead of a hare, it was possible to indicate a specific hare Stepashka, thereby setting his individual characteristics:
head (s) {shape: round} if head (s) ⊂ Stepashka (i).
The terms used in the characteristics (“round”, “square”, “solid”, etc.) are undetectable: we feel them directly, therefore no verbal definition is required.
I will designate this type of word with the symbol “a” - from “attribute”, like this:
round (a).
I draw your attention to the fact that individual objects and classes are nouns (these are entities!), While characteristics are adjectives (for that they are characteristics!). From the point of view of correspondence of types to parts of speech, everything is completely legal.
The adjective “round” is an undetectable characteristic, but, say, the adjective “hare”, not matching any of the sensations a person has, does not fit attribute.
Obviously, the definition of "hare" should be through the "hare", which I have already done (by decomposing the "hare" into its component parts). That is, the term “hare” first appeared, and then the adjective “hare” was formed from it, meaning: relating to a hare, similar to a hare.
We get a new type, denoted by the symbol "d" - from "dependence". Type indications are, of course, not enough - a reference to the parent term is necessary. We introduce a new notation using the symbol “=>” to denote the dependence:
hare (d) => hare (s).
Now the term "hare" is defined - through the parent noun "hare".
We defined the dependent adjective through the parent noun. It happens the other way around: when a dependent noun is formed from the parent adjective. For example, “square” is an adjective denoting the shape of an object. In the light of the foregoing, it becomes clear that the “square” came from the “square”, but not the “square” from the “square”.
square (d) => square (a).
Thus, in each group of cognate terms there is a parent term from which all the others originate.
Now I managed to derive all the terms from the original undetectable? Still not - there remains a significant terminological group, not yet covered: those concepts that can be deduced by means of formulas.
Take a verb - for example, “move”: we have not yet encountered verbs. What is a "move"? I use not an academic definition, but one that, from my point of view, reflects the essence of the matter:
“Move” - this is when an object changes its location under the influence of another object.
The formula is as follows:
X (i) 1 # move (f) X (i) 3 {move: nonzero (a)}.
I hasten to give the necessary explanations.
The formula consists of three parts denoting the subject, action and object:
- X (i) 1 is the subject. "X" refers to any individual entity, under serial number 1.
- # movef is an action. “F” is a formula, from “formula”. A lattice denotes a definable word (in this example, this is superfluous, but could be required when pointing to a specific subject or object).
- X (i) 3 is an object. The rest is identical to the subject. Curly brackets indicate the characteristic that has changed as a result of the impact of the subject.
The rules are flexible: new concepts are easily constructed in accordance with them. The general unfilled structure is taken (subject - action - object):
X (i) 1 X (f) 2 X (i) 3.
The necessary elements are replaced by specific terms, characteristics are indicated, the element being identified is marked with a grid, and if necessary, logical operands are used.
Let's practice a little, for example with adverbs, which can also be expressed in formula terms.
Take the adverb “carefully” - from my point of view, the parent is in the group of cognate words (“carefulness”, “careful”, “carefully”, “protect”). The word denotes a characteristic, but not an object, but an action. I’ll give you a conditionally primitive definition:
“Carefully” - this is when someone moves a thing slowly.
Things are defined, “slow” - dependent on “slow”, which is a characteristic of objects in speed.
slow (d) => slow (a).
And the term "move" has already been processed. Thus, there is everything necessary to define the term “carefully”:
X (i) 1 move (f) {# carefully (f)} X (i) 3 {speed: slow (a)}.
Here, “carefully” is defined through “move” and “slowly” and, like any other dialect, refers to action.
According to such rules, it is possible to determine new formulaic concepts from previously obtained ones, and so on, including using implications, and possibly other logical methods. The more complex the abstract concept, the more complex and deeper the structure of the resulting formula. We can get a formal definition of any term, and how much it turns out to be correct depends on us.
Naturally, the proposed language can be expanded - more than enough possibilities. For example, the notion of synonyms begs:
hippo (s) = hippo (s).
It does not talk about other parts of speech: those that are used for the emotional coloring of sentences (interjections) or various technical needs (unions).
Yes, you never know what else! However, the direction of thought is important, while the syntax of such a language is a purely secondary and applied question.
I summarize.
We have the following types of words:
- i - individual objects: determined by belonging to the class, are nouns;
- c - classes: determined by the decomposition into components, to the level of elementary classes. Elementary classes are defined by characteristics. Both are nouns;
- a - characteristics of objects and classes. They are adjectives;
- d are dependent terms. Formed from the parent term. May be any part of speech;
- f - formula concepts. They are nouns, or verbs, or adjectives.
And the following word formation sequence:
- At the lower level are the characteristics of elementary objects, and through them - classes: red, solid, round, etc.
- The combination of original characteristics makes it possible to assign a name to an elementary object: for example, all round and red objects growing on trees can be called apples. As a result, we obtain a term suitable for designating both a class (apples as a whole) and an individual object (this is precisely this apple).
- The presence of individual objects allows you to assign them unique names (Stepashka hare).
- The initial terms are formed arbitrarily, if necessary (this beast could be called a hare, or it could be called a rabbit, nothing would have changed from this).
- Dependent ones are formed on the initial terms, as a rule, which are other parts of speech.
- Based on the terms that have been defined, formulas can be compiled to define the following terms, with complex logical conditions.
Probably, readers already have a question, why is all this necessary?
I ran into this problem sculpting a chatbot. The testers we managed to attract were just a few people! - behaved, from my point of view, equally insane: they asked questions, wanting to get answers to them. Naive! It was as if they did not know that before asking questions, you should enter information into the database. But even with the successful overcoming of this obstacle, it turned out to be very problematic, in view of the invariance of human speech, to foresee the form of the question.
It costs nothing to enter the text in the database:
“Birds fly.”
Then the answer to the question can be obtained. It is difficult to wait for an answer to a more intricate question, which is essentially a variation of the first:
"Do cockatoos wave their wings when flying?"
To do this, you need to know the many relationships between words, namely:
- cockatoos belong to the class of birds;
- “Fly” and “flight” are root words;
- you can fly with the wings;
- wings are waving.
The first two points can, in principle, be downloaded to the chatbot from dictionaries (although, as I was convinced with inescapable sadness in my soul, you won’t find dictionaries during the day with fire). And to realize the last two points is simply nowhere. In our heads, the information is in the form in which the doctor prescribed, but you can’t get it out of there. While dictionaries with the required content are absent by definition: the wretched, which occasionally comes across, offers verbal formulations, while strict formal ones are needed.
So I asked myself a question: how to formalize speech so that, on the basis of indefinable terms, give definitions to others, so that it becomes possible to compile a complete Dictionary of lexical relations. If successful, the chatbot will be able to answer the question whether the cockatoo flap its wings in flight.
To point the way is the only thing that is possible for me at the current stage of life. However, I’m not sure that the considerations expressed here are completely original: attempts to formalize speech during the development of AI took place, and certainly more than once. However, the feature of my proposal is not just in filling out the base with phrases in natural or artificial language (meaningful filling in the base does not apply to the topic under discussion), but in defining any subsequent terms from a limited number of undefined concepts. I do not know anything about attempts to implement this idea.
Actually, that's all.
[joke]
Can you tell me if the Nobel Prize is given out by check, bank transfer or cash?
[/ joke]