Where can we get such a tool to shoe a semantic flea?
Inspired by the publication “Dirty programming with a pure soul” ( http://habrahabr.ru/company/abbyy/blog/144859/ )
Dmitry from ABBYY gave a good metaphor with many meanings in his post. The author, not offended by the lack of talent and the gift of writing, touches on the very thin border sphere of the intersubstantial-mental transition from “computer hardware” along with the accompanying software infrastructure to the information field, where an invisible “quantum transformation” of physical laws into mathematical stochastics takes place.
Indeed, in order to clean something dirty, it is always necessary to stain something clean. And this principle from the Murphy collection is absolutely true for the substantial world, but it turns out that it does not extend to the mental sphere at all.
Speaking of “dirty programming” as “dirty technologies” for cleaning and recycling “dirty matters”, you need to know and remember that technologies in themselves are neither “clean” nor “dirty”, as, for example, chemistry cannot be "socialist" or "capitalist". But all the technologies are characterized as “exact” or “rough”. At the same time, accuracy or coarseness is affected by one of two components, of which any technology, in fact, its tool, is composed. It is clear that if an ordinary shovel acts in this capacity, then this gardening tool cannot be used as a screwdriver to repair the mechanism of a watch, although the second component of the technology - the methodology - remains the same in both cases, that is, you just need to rotate it with a tool. Even simple electrical screwdrivers will be too coarse for precision screws. Therefore, the first principle of technotronics is that technological means must be comparable with the scale of “dirt”, otherwise it will not be removed.
So, in order to successfully process “dirty” information raw materials into a “clean” information product, strict fulfillment of the above conditions is necessary - the technological processing tool must be thinner, smaller, more accurate ... that is, more precisely than the “dirt elements” themselves, which clogged, stained, dirtied information raw materials to be cleaned. Only in this case it is possible to separate “cutlets from flies”, that is, to separate the main from the secondary, the useful from the unnecessary, and so on.
And vice versa, if something cannot be cleaned in any way and, in any way, cannot be processed to a given level of purity, then this means exactly one thing, the tool clearly does not satisfy the condition of reviewing, that is, it is a dumb and crude means.
Today, one super ambitious task for programmers is at stake, the various variations of which for more than half a century now have not been solved either by cybernetics with linguists, or by brain experts with philologists, namely, to recognize using computers any information products of the activity of human consciousness, embodied in the form linguistic epistolary or verbal representations (text or speech), where it is necessary to understand the semantic and / or semantic loading of any audio-visual or simply graphic images, created Aviated by people.
One of such variations is a machine understanding of natural-language texts, both at the level of distinguishing the semantics of individual language graphemes-symbols, and at the level of extracting the meaning of the context, which, in my opinion, the glorious workers in the person of Dmitry in such an authoritative and a very respected company like ABBYY.
And, it would seem, somewhere, where, and here everything you need is available in the necessary and sufficient quantity - there is the intellectual potential of the employees (unusually high), and the proper financing (decent), and the required methodology (in the widest assortment), and the necessary tools (from Compreno to neuro-semantic networks with all sorts of gadgets - everything is there). And there is no desired result !? Why? Yes, because, just, the tools are NOT PRECISE! And in this area are simply unsuitable. I will not compare them in severity with Siberian felt boots, but the means used, which may be good for solving “substantive” problems, do not at all correspond to the role that they play in the information sphere.
Specifically, with linguistic means it is NOT possible to operate in the field of procedural thinking, that is, one cannot use the “crude linguoscalpel” to operate in the hyperfine semantic zone of brain neurons. The tool for this should be appropriate, thin enough, accurate, acutely sensitive. As, for example, in electrical engineering, where there is a principle according to which the tuning range of the variometer (potentiometer) should not be greater than the size of the error, that is, “noise” or “dirt” in the case we are considering.
Linguistics, as you know, can establish, and even then not completely, any connections and relations between words in sentences (speech or text). Similar to how a construction scientific discipline can establish connections and relationships between building elements (words) in various building structures (contexts). Specialists of both regions can give out these connections and relations (conrelates) in the form of some formalized representations, for example, “three tons of bricks with a Mohs hardness of 6 in the amount of 1000 pieces are connected by 100 kilograms of Portland cement 500 and 200 kilograms of quartz sand”.
It is unlikely that you will guess what design such a record is. Similarly, the computer has no idea what the record is: [the subject “cartridge” is associated with the predicate “entered”, forming a predicate that relates to the preposition “in” in the sentence group “to the store”]. But, if the builders told us the phrase “brick wall”, then we would immediately understand what, in fact, we are talking about, and would not bother ourselves, tying tons to kilograms, and sand to cement and bricks.
At the same time, it is far from a fact that if linguists say the phrase: “The cartridge has entered the store”, then we will understand exactly which [cartridge] we are talking about (chief, electrical product, part of a lathe or a shooting object). However, under [the store] one can understand different images, even a horn from an automatic machine, even a trading company.
These examples should give us a clear idea that no computer can understand the semantic load of the speech context using computer linguistics, and this requires a completely different tool. Which one? And exactly the one that we use in everyday life.
Let's see how thinking and communication between people take place. First, in the consciousness of an individual, certain thought forms are swarming, which he imagines in the form of Models of Behavior of Images (MPO). Then he builds one or another semantic construct from these MPOs and decides to convey it to us using natural language capabilities, since we, unfortunately, or, fortunately, do not have telepathy. In his message, the author of thoughtforms presents MPO in encoded form using language elements (words, tokens, etc.). The recipient, perceiving the messages, already includes his associative memory, from which he takes out his own IGOs stored for the time being, with the help of which he tries to understand what the author wanted to say.
This is how our consciousness works. How does a computer work? What technologies does he use and are they similar to human ones? No, they are not like ours. But, since a computer is a calculating machine, then, developers are stubbornly trying to CALCULATE the final result, instead of trying to teach the computer to understand the meaning by simply storing these very MPOs and then operating them.
It turns out that the “computing” technologies act just in the role of such “dirty Cinderella”, when we all need exactly the precision means of understanding and recognizing images and meanings.
Dmitry from ABBYY gave a good metaphor with many meanings in his post. The author, not offended by the lack of talent and the gift of writing, touches on the very thin border sphere of the intersubstantial-mental transition from “computer hardware” along with the accompanying software infrastructure to the information field, where an invisible “quantum transformation” of physical laws into mathematical stochastics takes place.
Indeed, in order to clean something dirty, it is always necessary to stain something clean. And this principle from the Murphy collection is absolutely true for the substantial world, but it turns out that it does not extend to the mental sphere at all.
Speaking of “dirty programming” as “dirty technologies” for cleaning and recycling “dirty matters”, you need to know and remember that technologies in themselves are neither “clean” nor “dirty”, as, for example, chemistry cannot be "socialist" or "capitalist". But all the technologies are characterized as “exact” or “rough”. At the same time, accuracy or coarseness is affected by one of two components, of which any technology, in fact, its tool, is composed. It is clear that if an ordinary shovel acts in this capacity, then this gardening tool cannot be used as a screwdriver to repair the mechanism of a watch, although the second component of the technology - the methodology - remains the same in both cases, that is, you just need to rotate it with a tool. Even simple electrical screwdrivers will be too coarse for precision screws. Therefore, the first principle of technotronics is that technological means must be comparable with the scale of “dirt”, otherwise it will not be removed.
So, in order to successfully process “dirty” information raw materials into a “clean” information product, strict fulfillment of the above conditions is necessary - the technological processing tool must be thinner, smaller, more accurate ... that is, more precisely than the “dirt elements” themselves, which clogged, stained, dirtied information raw materials to be cleaned. Only in this case it is possible to separate “cutlets from flies”, that is, to separate the main from the secondary, the useful from the unnecessary, and so on.
And vice versa, if something cannot be cleaned in any way and, in any way, cannot be processed to a given level of purity, then this means exactly one thing, the tool clearly does not satisfy the condition of reviewing, that is, it is a dumb and crude means.
Today, one super ambitious task for programmers is at stake, the various variations of which for more than half a century now have not been solved either by cybernetics with linguists, or by brain experts with philologists, namely, to recognize using computers any information products of the activity of human consciousness, embodied in the form linguistic epistolary or verbal representations (text or speech), where it is necessary to understand the semantic and / or semantic loading of any audio-visual or simply graphic images, created Aviated by people.
One of such variations is a machine understanding of natural-language texts, both at the level of distinguishing the semantics of individual language graphemes-symbols, and at the level of extracting the meaning of the context, which, in my opinion, the glorious workers in the person of Dmitry in such an authoritative and a very respected company like ABBYY.
And, it would seem, somewhere, where, and here everything you need is available in the necessary and sufficient quantity - there is the intellectual potential of the employees (unusually high), and the proper financing (decent), and the required methodology (in the widest assortment), and the necessary tools (from Compreno to neuro-semantic networks with all sorts of gadgets - everything is there). And there is no desired result !? Why? Yes, because, just, the tools are NOT PRECISE! And in this area are simply unsuitable. I will not compare them in severity with Siberian felt boots, but the means used, which may be good for solving “substantive” problems, do not at all correspond to the role that they play in the information sphere.
Specifically, with linguistic means it is NOT possible to operate in the field of procedural thinking, that is, one cannot use the “crude linguoscalpel” to operate in the hyperfine semantic zone of brain neurons. The tool for this should be appropriate, thin enough, accurate, acutely sensitive. As, for example, in electrical engineering, where there is a principle according to which the tuning range of the variometer (potentiometer) should not be greater than the size of the error, that is, “noise” or “dirt” in the case we are considering.
Linguistics, as you know, can establish, and even then not completely, any connections and relations between words in sentences (speech or text). Similar to how a construction scientific discipline can establish connections and relationships between building elements (words) in various building structures (contexts). Specialists of both regions can give out these connections and relations (conrelates) in the form of some formalized representations, for example, “three tons of bricks with a Mohs hardness of 6 in the amount of 1000 pieces are connected by 100 kilograms of Portland cement 500 and 200 kilograms of quartz sand”.
It is unlikely that you will guess what design such a record is. Similarly, the computer has no idea what the record is: [the subject “cartridge” is associated with the predicate “entered”, forming a predicate that relates to the preposition “in” in the sentence group “to the store”]. But, if the builders told us the phrase “brick wall”, then we would immediately understand what, in fact, we are talking about, and would not bother ourselves, tying tons to kilograms, and sand to cement and bricks.
At the same time, it is far from a fact that if linguists say the phrase: “The cartridge has entered the store”, then we will understand exactly which [cartridge] we are talking about (chief, electrical product, part of a lathe or a shooting object). However, under [the store] one can understand different images, even a horn from an automatic machine, even a trading company.
These examples should give us a clear idea that no computer can understand the semantic load of the speech context using computer linguistics, and this requires a completely different tool. Which one? And exactly the one that we use in everyday life.
Let's see how thinking and communication between people take place. First, in the consciousness of an individual, certain thought forms are swarming, which he imagines in the form of Models of Behavior of Images (MPO). Then he builds one or another semantic construct from these MPOs and decides to convey it to us using natural language capabilities, since we, unfortunately, or, fortunately, do not have telepathy. In his message, the author of thoughtforms presents MPO in encoded form using language elements (words, tokens, etc.). The recipient, perceiving the messages, already includes his associative memory, from which he takes out his own IGOs stored for the time being, with the help of which he tries to understand what the author wanted to say.
This is how our consciousness works. How does a computer work? What technologies does he use and are they similar to human ones? No, they are not like ours. But, since a computer is a calculating machine, then, developers are stubbornly trying to CALCULATE the final result, instead of trying to teach the computer to understand the meaning by simply storing these very MPOs and then operating them.
It turns out that the “computing” technologies act just in the role of such “dirty Cinderella”, when we all need exactly the precision means of understanding and recognizing images and meanings.