Neural network as an activation database
For AI on neural networks to be universal, you need to understand what the neural network is not enough for versatility. To do this, try to implement the full execution of any programs on the neural network. This will require conditional transitions, conditions, reading and writing data structures. After that, it will be possible to create object-oriented neural networks. The article will have to be divided into parts.
Consider the different types of neural clusters. Sensory and effector clusters have already been mentioned.
If it is And , it is activated only if all conditions are active - that is, the signal has arrived at all synapses.
Or- triggered if at least one feature has been activated. If this cluster is part of a chain, then backward chain communication is mandatory - it is connected by the And condition. In other words, a cluster is activated only if the previous cluster of the chain was active and any of its own conditions also worked. In analogy to programming languages, chain communication acts as an instruction pointer in the central processor - a signal “I allow the execution of the remaining conditions of the cluster”. Let's look at some code.
Note that in _prev there is usually either no link, or one link. This makes a prefix tree out of memory chains: in _next there can be as many links as you like, and in _prev there can be no more than one. Only in ordinary prefix trees there is only one letter at each position, and in the neural network there is an arbitrary number of characters. Thanks to this, even storing Zalizniak’s dictionary will not take up much memory.
Now, for the sake of convenience, we’ll run ahead and so that later we don’t have to rewrite such a code, we will immediately get rid of neurons and activations.
If the clusters somehow kept the activation history, and did not send their activation to others, we could rewrite this function like this:
Then a lot of problems would immediately have gone away:
1) After several forecasting cycles, it is not necessary to restore the state of the neural network — the clusters both stored and store information about their activation for the corresponding cycles. Prediction can be included much more often and at longer intervals ahead.
2) The neural network is resistant to changes: if a connection to the other cluster was added late to the cluster, then you do not need to send signals again to summarize the activation on the destination cluster - you can immediately check the conditions. The code becomes more functional-paradigmatic - a minimum of side effects.
3) It becomes possible to introduce arbitrary signal delays: if the activation cache can store data for different cycles, then you can check whether the cluster H cycles back was active.
To do this, add a variable parameter to the connection - the delay time:
and then the function is modified like this:
4) We get rid of stuttering “grass in the yard, firewood on the grass, ...”: signals from newer cycles will not overwrite old ones, and vice versa.
5) There is no danger that activation will fade away (by itself, from time to time) when it is still required. You can check conditions far back in time.
6) Finally, you can not write a dozen articles on the topic "neural network management through rhythmic activity management", "visualization methods for control signals of electroencephalograms", "special DSL for controlling electroencephalograms" and throw out this is all:
Now about implementing such an activation cache:
1) ENS give us three options for placing the activation cache: current activation in the neurocluster itself in its neurons, activation (in the form of identification waves?) In the hippocampus (here it is stored longer than on the cluster itself), and long-term memory. It turns out a three-level cache, just like modern processors.
2) In the software model, the activation cache at first glance is conveniently located in each cluster.
3) More specifically, we already have both this and that: the hippocampus in this model creates a memory chain, and connections to all clusters that were active and were not inhibited at that moment of time are entered into the memory chain. And each connection is stored in one cluster as outgoing and in another as incoming. This shows that the "cache" is actually not a cache, but even a long-term memory. Only biological neural networks cannot extract information from long-term memory directly, only through activation, and ANNs can. This is the advantage of AI over the ENS, which is stupid not to use - why bother with activations if we need semantic information?
So, to check if the cluster was active N steps back, you can use this (not optimized) pseudo-code:
If, instead of an activation that has gone into oblivion, it is necessary to preserve not only the presence of the connection, but also the activation force, then the corresponding field can be added to the connection itself. Other fields can be used for this purpose, without introducing additional ones: for example, “importance”, on which the communication life span depends.
But what about clusters in which activation does not reach the threshold, but is still useful, for example, for fuzzy recognition, or miscalculation of probabilities, etc.? An unoptimized solution is to use all the same connections. To do this, either create additional containers of links within the cluster and add them there (so as not to mix with the normal ones that worked), or even interfere with everything in a heap, and separate them only by force. Such relationships will need to be removed faster, since they are an order of magnitude larger than others. A more optimized solution: each cluster stores a normal activation cache - for example, a circular buffer (ring) of 16 elements, where each element stores a cycle number and activation force for that cycle. There is a two-level cache: for weak signals, subthreshold, and the most recent - a buffer in the cluster, otherwise, connections to long-term memory. Do not forget that in these articles only pseudo-code and naive algorithms are shown, and optimization issues can take up much more space.
Consider the different types of neural clusters. Sensory and effector clusters have already been mentioned.
If it is And , it is activated only if all conditions are active - that is, the signal has arrived at all synapses.
Or- triggered if at least one feature has been activated. If this cluster is part of a chain, then backward chain communication is mandatory - it is connected by the And condition. In other words, a cluster is activated only if the previous cluster of the chain was active and any of its own conditions also worked. In analogy to programming languages, chain communication acts as an instruction pointer in the central processor - a signal “I allow the execution of the remaining conditions of the cluster”. Let's look at some code.
class NC; // neurocluster class Link { public: NC & _from; NC & _to; ... }; Class LinksO; / * container for outgoing links. Convenient to do based on boost :: intrusive - for the sake of saving memory and improving performance * / class LinksI; // also based on boost :: intrusive struct NeuronA1 { qreal _activation = 0; static const qreal _threashold = 1; // this threshold is never changed or stored for the sake of saving memory, the neuron is always normalized. bool activated () const { return _activation> = _threshold; } }; struct NeuronAT { qreal _activation = 0; qreal _threashold = 1; // is changed and stored bool activated () const { return _activation> = _threshold; } }; class NC { public: LinksO _next; LinksO _down; LinksI _prev; LinksI _up; NeuronA1 _nrnSumPrev; NeuronAT _nrnSumFromBottom; ... } // to make it clearer how activation on _nrnSumPrev appears: void NC :: sendActivationToNext () { for (Link & link: _next) { link._to._nrnSumPrev._activation + = 1; } } // this function is the same for all clusters - and / or / not and others: bool NC :: allowedToActivateByPrevChain () const { if (_prev.isEmpty ()) // there are no connections back in time, the cluster is not in the chain, so there are no constraining conditions. return true; // therefore, it is possible to check other conditions specific to this type of cluster. return _nrnSumPrev.activated (); // it would be possible to get away from the first check for the presence of connections if you set the threshold of the neuron when adding and removing links. // then, in the absence of connections, the threshold is always 0 and the neuron is always activated. // but once you can forget to change the threshold, so for research code it is better to check for connections, and not change the thresholds. }
Note that in _prev there is usually either no link, or one link. This makes a prefix tree out of memory chains: in _next there can be as many links as you like, and in _prev there can be no more than one. Only in ordinary prefix trees there is only one letter at each position, and in the neural network there is an arbitrary number of characters. Thanks to this, even storing Zalizniak’s dictionary will not take up much memory.
Now, for the sake of convenience, we’ll run ahead and so that later we don’t have to rewrite such a code, we will immediately get rid of neurons and activations.
If the clusters somehow kept the activation history, and did not send their activation to others, we could rewrite this function like this:
bool NC :: allowedToActivateByPrevChain () const { for (Link & link: _prev) { NC & nc = link._from; if (! nc.wasActivated ()) // check for the last cycle return false; } return true; }
Then a lot of problems would immediately have gone away:
1) After several forecasting cycles, it is not necessary to restore the state of the neural network — the clusters both stored and store information about their activation for the corresponding cycles. Prediction can be included much more often and at longer intervals ahead.
2) The neural network is resistant to changes: if a connection to the other cluster was added late to the cluster, then you do not need to send signals again to summarize the activation on the destination cluster - you can immediately check the conditions. The code becomes more functional-paradigmatic - a minimum of side effects.
3) It becomes possible to introduce arbitrary signal delays: if the activation cache can store data for different cycles, then you can check whether the cluster H cycles back was active.
To do this, add a variable parameter to the connection - the delay time:
class Link { ... int _delay = 1; };
and then the function is modified like this:
bool NC :: allowedToActivateByPrevChain () const { for (Link & link: _prev) { NC & nc = link._from; if (! nc.wasActivated (link._delay)) // check N cycles back return false; } return true; }
4) We get rid of stuttering “grass in the yard, firewood on the grass, ...”: signals from newer cycles will not overwrite old ones, and vice versa.
5) There is no danger that activation will fade away (by itself, from time to time) when it is still required. You can check conditions far back in time.
6) Finally, you can not write a dozen articles on the topic "neural network management through rhythmic activity management", "visualization methods for control signals of electroencephalograms", "special DSL for controlling electroencephalograms" and throw out this is all:
Now about implementing such an activation cache:
1) ENS give us three options for placing the activation cache: current activation in the neurocluster itself in its neurons, activation (in the form of identification waves?) In the hippocampus (here it is stored longer than on the cluster itself), and long-term memory. It turns out a three-level cache, just like modern processors.
2) In the software model, the activation cache at first glance is conveniently located in each cluster.
3) More specifically, we already have both this and that: the hippocampus in this model creates a memory chain, and connections to all clusters that were active and were not inhibited at that moment of time are entered into the memory chain. And each connection is stored in one cluster as outgoing and in another as incoming. This shows that the "cache" is actually not a cache, but even a long-term memory. Only biological neural networks cannot extract information from long-term memory directly, only through activation, and ANNs can. This is the advantage of AI over the ENS, which is stupid not to use - why bother with activations if we need semantic information?
So, to check if the cluster was active N steps back, you can use this (not optimized) pseudo-code:
NC * Brain :: _ hippo; // current cluster to which current events are added NC * NC :: prevNC (int stepsBack) const { // go backward through the chain through links through _prev // at the same time, add link._delay to find out the current offset back in time. // check the boundaries, (not) return the result } bool NC :: wasActivated (int stepsAgo) const { NC * timeStamp = _brain._hippo-> prevNC (stepsAgo); if (! timeStamp) // the system does not remember anything about that time return false; return linkExists (timeStamp, this); // the link lookup code should execute quickly, since boost not only gives intrusive lists, // but also trees, and the size of one node grows from just 2 to 3 pointers }
If, instead of an activation that has gone into oblivion, it is necessary to preserve not only the presence of the connection, but also the activation force, then the corresponding field can be added to the connection itself. Other fields can be used for this purpose, without introducing additional ones: for example, “importance”, on which the communication life span depends.
But what about clusters in which activation does not reach the threshold, but is still useful, for example, for fuzzy recognition, or miscalculation of probabilities, etc.? An unoptimized solution is to use all the same connections. To do this, either create additional containers of links within the cluster and add them there (so as not to mix with the normal ones that worked), or even interfere with everything in a heap, and separate them only by force. Such relationships will need to be removed faster, since they are an order of magnitude larger than others. A more optimized solution: each cluster stores a normal activation cache - for example, a circular buffer (ring) of 16 elements, where each element stores a cycle number and activation force for that cycle. There is a two-level cache: for weak signals, subthreshold, and the most recent - a buffer in the cluster, otherwise, connections to long-term memory. Do not forget that in these articles only pseudo-code and naive algorithms are shown, and optimization issues can take up much more space.