"Liquid perceptron" or the hypothesis of how to realize real parallelism
In the comments to the article, Algorithmic unsolvability is not an obstacle for algorithmic AI, I spoke out in the light of the fact that
and hinted that the problems of building AI lie rather in the fundamental (insurmountable) slowness of computers built on the principle of Turing machines.
The article also talked about the theoretical problem of algorithmic unsolvability by the example of a stop problem.
and here the "algorithm" of a person’s work with finding particular cases seems underestimated. This is not within the power of a computer; it lacks a device due to which it could single out particular cases. Of course, in the comments in this article, many immediately encoded these particular cases, but the point is that the computer itself would do this when solving it. It seems that finding particular cases at least fundamentally reduces the complexity of the calculations. Simplifying and idealizing a person then proceeds to formulate laws, thereby moving to a qualitatively different level. And this is not accessible to the computer.
But all in order.
How does a person solve problems?
A person solves problems (structural problems, not parametric ones (clarification of terms later)) much faster than they can be calculated in principle on any conceivable computer. But due to what? As a hypothesis: due to the loss of universality of the solution, i.e. reducing to a certain set of private final solutions, and only faith allows us to think of a solved problem as solved in general, until they refute a particular solution.
Example: Let's look at the period when the child learns to count. He already knows how to count to 100. He knows how to add / subtract numbers to 10. But he still cannot solve examples of adding tens. For example, when asked to solve 10 + 10 =, or 20 + 10 =. This causes significant problems. But at the same time, he understands the principles of addition and already counts to 100. That is, the child just needs to add 10 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1. But neither adults nor the child considers this a solution. While he does not immediately respond 10 + 10 = 20, we believe that he does not know this, does not know how, does not understand how to fold. Those. polynomial computation is not recognized by a person for his ability to count. So the computer does not know how to do this. You can also try to calculate 10 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 in parallel, but it turns out that parallelization does not reduce the number of calculations, time can only be slightly reduced. But the main thing for the solution is knowledge of the global context (see below). What comes to the rescue? - introduction of the structure in the calculations. It is not known why, but historically man has come to what is considered more convenient by dozens (not fives, dozens or binary). And therefore, when we teach a child to count, we separately teach to add tens, discarding zero, leading to this form 10 + 10 = 1 + 1 + "0" = 20. Those. remove the zeros, add up the units and simply assign the zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. It is not known why, but historically man has come to what is considered more convenient by dozens (not fives, dozens or binary). And therefore, when we teach a child to count, we separately teach to add tens, discarding zero, leading to this form 10 + 10 = 1 + 1 + "0" = 20. Those. remove the zeros, add up the units and simply assign the zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. It is not known why, but historically man has come to what is considered more convenient by dozens (not fives, dozens or binary). And therefore, when we teach a child to count, we separately teach to add tens, discarding zero, leading to this form 10 + 10 = 1 + 1 + "0" = 20. Those. remove the zeros, add up the units and simply assign the zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. add up the units and simply assign zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. add up the units and simply assign zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable.
Examples of structural adaptation
How is it possible to overcome the limit of polynomial computation for polynomial problems.
In order not to go into details, let us assume that the Rosenblatt perceptron solves the “parity” problem posed by Minsky. If on fingers, then the problem there is that despite the fact that the perceptron works in parallel to determine the even or odd number of units at the input, at least one node that would be connected to all inputs is required. Those. you need to know the global context.
And then we see that the value of parallel devices for tasks such as addition, multiplication, etc. getting not very high. The degree of parallelism on such basic things is minimal.
But suppose we have 4 points with values of 0 (no point) or 1 (there is a point). Then there are 16 combinations for which you need to answer whether the number of points is even. If we remove at least one combination, then the requirement of a global context becomes optional.
Then such a decision with R = 4 and with the loss of one stimulus - tells us the logic of the action. Let's lose the universality of the solution, but we will solve everything else and plus we will know on what incentive we are making a mistake.
This approach leads us to a specific device. Suppose a certain system G tells us how many points on the retina, i.e. will implement the invariant size. Then our training set (16 combinations) can be divided into 4 parts depending on the size. And then separately conduct training on different "columns" of intermediate neurons. What will it give? This will significantly reduce the number of links from the required 4 to 2. Why is this possible? Because each training sample does not contain the problem "in its entirety", and therefore does not require a solution to the problem of "parity" at all.
Using the invariant “size”, our problem became even degenerate, because knowledge of size = knowledge of the number of points, and using this knowledge it is already easier to say whether this number is even or not. But suppose we solve a problem where the existing invariant “size” does not so directly facilitate our task. What then? The relief will occur indirectly, the main thing with the help of the size invariant is the task will be divided into parts, and similarly to the "parity" task, the training sample will be divided, which will reduce the universality of the required solution. And again, an order of magnitude fewer connections will be needed than is needed to solve the problem in general.
On the way to the liquid perceptron
We now have to see what we should organize the system G, which in the particular case will implement the invariant "size". No miracle happened here? Of course not - our law on the need for a global context cannot be overcome.
Suppose we implement a system G as 4 threshold elements that are connected to all inputs, but they have different thresholds. Element a1 will pass a signal if the number of points is greater than zero, a2> 1, a3> 2, a4> 3. And again, we used all connections with everyone here and “overspent” the number of connections. But the fact is that logical structural adaptation without physical structural adaptation really gives little use.
But if we physically refuse for our system G to use the principle of connection of bonds, and replace it with a more physically suitable system of "fluid concentration", then the effect will already be observed (practical). Such a system of "fluid concentration" can be imagined as follows: suppose our perceptron floats in a fluid. The inputs send not only electrical signals through the bonds, but also chemically acting on the liquid. The liquid has a certain level of concentration of “acidity”. Intermediate neurons have different levels of tolerance for acidity. The more electrical signals appear at the inputs, the greater the level of concentration becomes in the liquid. And with an increase in the concentration level, certain groups of neurons that are less tolerant of “acidity” begin to “chop off”.
People familiar with neurophysiology will understand that the described system is essentially a simplified prototype of a biological neurosystem. But if in the theory of perceptrons previously only ideas about electrical interaction were used, now it becomes clear why chemical interaction is also necessary. If electrical interaction implements "focus", then chemical provides a "global context."
For some reason, everyone was fixated on NP tasks. But for some reason no one poses tasks FASTER to solve problems of class P (up to an instant answer)
and hinted that the problems of building AI lie rather in the fundamental (insurmountable) slowness of computers built on the principle of Turing machines.
The article also talked about the theoretical problem of algorithmic unsolvability by the example of a stop problem.
A person solves the shutdown problem, but does it with errors, the probability of which increases with the complexity of the programs. The ability of a person to solve algorithmically insoluble problems (like mass problems) is extremely doubtful. His ability to find solutions for individual special cases does not prove anything, because it is within the power of a computer.
and here the "algorithm" of a person’s work with finding particular cases seems underestimated. This is not within the power of a computer; it lacks a device due to which it could single out particular cases. Of course, in the comments in this article, many immediately encoded these particular cases, but the point is that the computer itself would do this when solving it. It seems that finding particular cases at least fundamentally reduces the complexity of the calculations. Simplifying and idealizing a person then proceeds to formulate laws, thereby moving to a qualitatively different level. And this is not accessible to the computer.
But all in order.
How does a person solve problems?
A person solves problems (structural problems, not parametric ones (clarification of terms later)) much faster than they can be calculated in principle on any conceivable computer. But due to what? As a hypothesis: due to the loss of universality of the solution, i.e. reducing to a certain set of private final solutions, and only faith allows us to think of a solved problem as solved in general, until they refute a particular solution.
Example: Let's look at the period when the child learns to count. He already knows how to count to 100. He knows how to add / subtract numbers to 10. But he still cannot solve examples of adding tens. For example, when asked to solve 10 + 10 =, or 20 + 10 =. This causes significant problems. But at the same time, he understands the principles of addition and already counts to 100. That is, the child just needs to add 10 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1. But neither adults nor the child considers this a solution. While he does not immediately respond 10 + 10 = 20, we believe that he does not know this, does not know how, does not understand how to fold. Those. polynomial computation is not recognized by a person for his ability to count. So the computer does not know how to do this. You can also try to calculate 10 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 in parallel, but it turns out that parallelization does not reduce the number of calculations, time can only be slightly reduced. But the main thing for the solution is knowledge of the global context (see below). What comes to the rescue? - introduction of the structure in the calculations. It is not known why, but historically man has come to what is considered more convenient by dozens (not fives, dozens or binary). And therefore, when we teach a child to count, we separately teach to add tens, discarding zero, leading to this form 10 + 10 = 1 + 1 + "0" = 20. Those. remove the zeros, add up the units and simply assign the zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. It is not known why, but historically man has come to what is considered more convenient by dozens (not fives, dozens or binary). And therefore, when we teach a child to count, we separately teach to add tens, discarding zero, leading to this form 10 + 10 = 1 + 1 + "0" = 20. Those. remove the zeros, add up the units and simply assign the zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. It is not known why, but historically man has come to what is considered more convenient by dozens (not fives, dozens or binary). And therefore, when we teach a child to count, we separately teach to add tens, discarding zero, leading to this form 10 + 10 = 1 + 1 + "0" = 20. Those. remove the zeros, add up the units and simply assign the zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. add up the units and simply assign zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable. add up the units and simply assign zero again. This is the structural decomposition of the addition operation. This rule is far from universal, but it is precisely this rule that allows one to count faster in certain necessary cases than any computer in which this aspect is not structured is potentially capable.
Examples of structural adaptation
How is it possible to overcome the limit of polynomial computation for polynomial problems.
In order not to go into details, let us assume that the Rosenblatt perceptron solves the “parity” problem posed by Minsky. If on fingers, then the problem there is that despite the fact that the perceptron works in parallel to determine the even or odd number of units at the input, at least one node that would be connected to all inputs is required. Those. you need to know the global context.
And then we see that the value of parallel devices for tasks such as addition, multiplication, etc. getting not very high. The degree of parallelism on such basic things is minimal.
But suppose we have 4 points with values of 0 (no point) or 1 (there is a point). Then there are 16 combinations for which you need to answer whether the number of points is even. If we remove at least one combination, then the requirement of a global context becomes optional.
Then such a decision with R = 4 and with the loss of one stimulus - tells us the logic of the action. Let's lose the universality of the solution, but we will solve everything else and plus we will know on what incentive we are making a mistake.
This approach leads us to a specific device. Suppose a certain system G tells us how many points on the retina, i.e. will implement the invariant size. Then our training set (16 combinations) can be divided into 4 parts depending on the size. And then separately conduct training on different "columns" of intermediate neurons. What will it give? This will significantly reduce the number of links from the required 4 to 2. Why is this possible? Because each training sample does not contain the problem "in its entirety", and therefore does not require a solution to the problem of "parity" at all.
Using the invariant “size”, our problem became even degenerate, because knowledge of size = knowledge of the number of points, and using this knowledge it is already easier to say whether this number is even or not. But suppose we solve a problem where the existing invariant “size” does not so directly facilitate our task. What then? The relief will occur indirectly, the main thing with the help of the size invariant is the task will be divided into parts, and similarly to the "parity" task, the training sample will be divided, which will reduce the universality of the required solution. And again, an order of magnitude fewer connections will be needed than is needed to solve the problem in general.
On the way to the liquid perceptron
We now have to see what we should organize the system G, which in the particular case will implement the invariant "size". No miracle happened here? Of course not - our law on the need for a global context cannot be overcome.
Suppose we implement a system G as 4 threshold elements that are connected to all inputs, but they have different thresholds. Element a1 will pass a signal if the number of points is greater than zero, a2> 1, a3> 2, a4> 3. And again, we used all connections with everyone here and “overspent” the number of connections. But the fact is that logical structural adaptation without physical structural adaptation really gives little use.
But if we physically refuse for our system G to use the principle of connection of bonds, and replace it with a more physically suitable system of "fluid concentration", then the effect will already be observed (practical). Such a system of "fluid concentration" can be imagined as follows: suppose our perceptron floats in a fluid. The inputs send not only electrical signals through the bonds, but also chemically acting on the liquid. The liquid has a certain level of concentration of “acidity”. Intermediate neurons have different levels of tolerance for acidity. The more electrical signals appear at the inputs, the greater the level of concentration becomes in the liquid. And with an increase in the concentration level, certain groups of neurons that are less tolerant of “acidity” begin to “chop off”.
People familiar with neurophysiology will understand that the described system is essentially a simplified prototype of a biological neurosystem. But if in the theory of perceptrons previously only ideas about electrical interaction were used, now it becomes clear why chemical interaction is also necessary. If electrical interaction implements "focus", then chemical provides a "global context."