eidCraft February 2, 2016 at 14:58

WTM (Waveform temporal memory) - a neural network model for solving the problem of adaptive behavior

From the sandbox

I present to the users of the network a neural network model designed to solve the problem of adaptive behavior (and its subtasks: recognition and prediction of sequences).

Foreword

I entered this area (AI and neighboring ones) by chance: an article at my leisure, another, third, a book, a couple more books, a monograph, and so on. As I moved from popular literature to current academic publications, I wondered “why they are doing this wrong?” (AI habits and adaptive systems are more modest). Then I thought that the fundamental ideas that they put into their models are incorrect (they do not lead to the results that the author hopes for). Acquaintance with the “wrong” models continued, indignation accumulated. Much later, I found “true” ideas in other people's works, but it was too late, the idea took shape - the accumulated thoughts needed to be combined into one model. About her and will be discussed.

Introduction

The article is divided into two large parts: theory and implementation of WTM.

The concept of adaptability

Adaptability will be called the ability of a control system (CS) of a certain autonomous object to gain knowledge about the properties of the system "environment - control object - control system", accumulate this knowledge in its memory and use this knowledge for appropriate control of the control object (OS).

Refinement area

A source. First of all, we will talk about objects that have SU - a specialized subsystem that works with information. First, we filter out “adaptive” machines without control systems, but with an option knob that provide a predetermined change in properties in the intended situations. Secondly, we filter out non-autonomous control systems, in which all knowledge was originally set up, by someone outside the machine, or when managing there is the possibility of access to knowledge outside the control system; we are only talking about machines that are capable of acquiring knowledge for control from our own experience. Thirdly, we screen out machines that work with arbitrarily specified objective functions; for example, a rocket that should fly where it should and explode there is not a suitable object, because in the nature of such objects, initially aimed at self-destruction or in general - at fulfilling goals set by someone, no, control systems for such objects can be arranged in any way - ad hoc. On the contrary, all living organisms have a tendency to survive due to the use of the knowledge accumulated by them in the management. Finally, let us explain what exactly the control system should manage. The nervous system of living organisms controls not just the body of the body, but the body immersed in the environment, i.e. the whole world that is outside of itself, and even more so - the world in which this nervous system itself is included, because when managing it, it also provides for its own future reactions. all living organisms have a tendency to survive through the use of knowledge accumulated by them in the management. Finally, let us explain what exactly the control system should manage. The nervous system of living organisms controls not just the body of the body, but the body immersed in the environment, i.e. the whole world that is outside of itself, and even more so - the world in which this nervous system itself is included, because when managing it, it also provides for its own future reactions. all living organisms have a tendency to survive through the use of knowledge accumulated by them in the management. Finally, let us explain what exactly the control system should manage. The nervous system of living organisms controls not just the body of the body, but the body immersed in the environment, i.e. the whole world that is outside of itself, and even more so - the world in which this nervous system itself is included, because when managing it, it also provides for its own future reactions.

Existing models

A large list of models and related information under the generic name of cognitive models can be found here and here .

Principles of Adaptive Behavior

For myself, I have identified the minimum set of principles necessary for SU in order to be called adaptive.

SU should continuously adapt to environmental events.
SU should have a minimal set of innate behavioral acts or reflexes. Based on this set of SU, the process of adaptation will begin.
SU should have the ability to generalize its experience to other environmental situations.

Theoretical part

As a working example, we will consider the problem of adaptive behavior. In the course of the text, the abbreviation WTM is used in the sense of “an instance of the wave temporary memory model”.

Adaptation

The principle of continuous adaptation means that over time the feasibility of SU reactions should increase. Here we are faced with a fork, which needs to be considered in more detail. The process of “increasing expediency” is different for active and reactive control systems.

In active systems, there is an element responsible for evaluating its functioning. It contains the criteria of expediency, by which the actions performed by the SU are evaluated. Based on these estimates, decisions are made and the behavior of the system is changed to increase the feasibility indicators.

The difference between reactive models is that there is no behavioral assessment unit in them. Because of this, the achievement of an increase in the expediency of behavior in such systems is impossible by the same means as in active ones. Therefore, it is necessary to use other ways of achieving expediency.

For a better understanding of this difference, let us consider in more detail the definition of adaptability. It can be divided into two parts:

In the process of functioning, the SU continuously increases the coverage of many environmental events with appropriate reactions.
In the process of functioning, the control system continuously replaces existing reactions with other, more appropriate

In the first paragraph, there are no differences between active and reactive systems. In both cases, an increase in coverage is reduced to a generalization of existing experience to new environmental situations. The difference lies in the second paragraph. Reactive systems are fundamentally incapable of it, since they cannot evaluate the appropriateness of actions.

There are two alternative ways to achieve expediency.

the choice of such an initial set of behavioral acts, after the generalization of which the basic environmental situations for the OS would have appropriate reactions.
training SU appropriate behavior through the external environment (education). In this case, the environment acts as an active agent that builds its effects on the OS so that the required behavior patterns are formed.

Although both methods pursue the same goal, they are very different both in necessary actions and in labor costs.

The first method is used to create a set of reactions. General algorithm:

Carry out an in-depth analysis of the future environment of the OS, in order to identify key situations requiring appropriate responses.
Choose the appropriate reactions for the situations found.
Create an instance of a reactive system in which the set of basic reactions consisted of the obtained situation-reaction pairs.

The second method is used to create one reaction. General algorithm:

It is necessary for the developer to create a complex incentive, the response to which would be the desired behavior
In the process of functioning, when the necessary situation occurs, the developer must act on the OS with the created incentive so that the SU carries out the required reaction.
Repeat step 2 until the effect is fixed.

The advantage of the second method is that it can be used at any time by the control system robots, while the first method can be applied only at the stage of creating the model instance. The condition of the second method is that the developer should know the set of control system reactions

Training

Learning at WTM is a two-part process: identifying environmental patterns and maintaining the sequence of CS responses to these patterns. Otherwise, learning is the process of accumulation of pairs of the type [regularity of the environment - reaction].

The pattern of the medium is a frequently repeated sequence of environmental signals. The sequence repetition frequency necessary for memorization is laid down in the WTM by the developer at the stage of creating the system (more precisely, in the “Implementation” paragraph).

Preservation of sequences of reactions of SU

For their preservation, the mechanism of associative relations is used.

Associative connection is a phenomenon in which the activity of one memory element (ES) causes the activation of another ES. We will denote A → B, where A and B are memory elements. Associative transition is the process of implementing associative communication. For association A → B, an associative transition means the beginning of activity B after activity A.

A memory element (EP) is a pattern (or pattern) of neural network activity. Activation can occur in response to a signal from the environment, as well as due to an associative transition. It should be noted that every reaction of a neural network is an ES, but not every ES is a network reaction. By the activity of EP, we understand the activity of the neurons entering it.

The power of associative communication is a numerical value characterizing the ability of associative communication to activate its final EP. For the association A → B, the binding force will be the ability A → B to activate B. It takes real values in the range [0, 1], and means the ratio of the number of activated neurons B to the number of all neurons B. The value is not used in the implementation, but necessary for understanding the model .

The process of preserving sequences of reactions of SU consists in the continuous creation of associative links between successive reactions of the network to environmental signals. When creating an associative connection, it is assigned the initial value of the communication force. With each repeated occurrence of EP, the strength of the associative connection increases in accordance with the memorization function.

Isolation of patterns

The identification of patterns is based on the interaction of the functions of memorization and forgetting. WTM remembers everything. The more often a pattern occurs, the stronger it becomes (the stronger the associative connections in its composition). At the same time, patterns are forgotten. The less common a pattern, the weaker it becomes (the stronger the associative connections in its composition). From the correlation of the functions of memorization and forgetting, it follows which associations will remain in memory and which will be forgotten.

It will not be amiss to give an example of the correlation of the functions of memorization and forgetting. We consider two extreme cases. The first case - remembering prevails over forgetting. This case leads to:

extremely detailed memorization of patterns
highest memory fill rate

In the case of the predominance of forgetting over memorization, everything is exactly the opposite:

only the most general patterns are remembered
minimum fill speed

At the moment, the selection of suitable memorization and forgetting functions is one of the most important stages in creating a WTM instance, since in the current state the WTM has a limited memory size and has no mechanisms for increasing it.

Basic behavioral acts

In the basic set of reactions, in addition to the reactions providing expediency, there should be one more class of reactions - the functional basis of the system. Elements of the functional basis correspond to the basic behavioral acts of the OS (raise the head, bend the first finger in the second phalanx, etc.). All behavior will be a combination of basic behavioral acts (only them).

At the level of the neural network, elements of the functional basis are sequences of EPs. Suitable environmental influences are selected for them. After that, we get a lot of pairs [regularity of the medium - reaction], corresponding to the selected functional basis.

Behavior in reactive models

Many sources reflect the idea of reactive systems as realizations of the principle [stimulus -> reaction]. In such systems, it is believed that the stimulus and reaction are separated by a minimum time interval (read the time taken to pass the reflex arc or similar structures). Such their definition follows from the basic principle of reactive systems - determinism. However, it is not entirely true. A more accurate scheme looks like this [stimulus -> internal reaction; deterministic change of internal states; internal state -> external activity (the word reaction is replaced by activity because in such a system the behavior depends on many stimuli, and not on one)] (see Fig. 1). The difference between this definition is the presence of an internal state model. Signal propagation is also a strictly deterministic process, however, SU is no longer an automaton with instant response to stimuli. In such a model, the stimulus can either have an external reaction or not. Also, the stimulus and the external reaction can be distant from each other over time over a large distance. Neural networks (WTM in particular) belong to precisely this class of systems (dynamic neural networks are implied).

Fig. 1 - Differences in understanding the functioning of a reactive control system

Fig. 1 - Differences in understanding the functioning of a reactive control system

The first part of the scheme (Fig. 1.b.1), the internal reaction in WTM is simply the reaction of the network to the stimulus. The mechanism of associations is responsible for the second part of the scheme (Fig. 1.b.2). At each cycle, the WTM has a state. For this condition, there may be suitable associative relationships. Then the process of determinate state transition is a process of continuous transition through associative relations. In other words, this is the process of reproducing previously remembered patterns. The third part of the scheme (Fig. 1.b.3) follows from the second. External activity will take place if there was external activity in reproducible patterns.

Generalization

Generalization is the process of transferring a behavioral reaction from one environmental event to another event, which is an abstraction of the first (abstraction is an object that does not have a set of properties in comparison with another object (it is called the original, or a special case).

As we already know, the sequence of reactions of SU stored in the WTM in chains EP related associative Then in terms of WTM abstract patterns -. a chain of EPO and associative relationships, in which the original EP and associations are replaced by their abstractions.

Abstraction P - is the EP, from the set of neurons which was seized of the neurons Abstraction association -. This association between abstractions EP strength of association Abstractions can be less than or equal to the force of the original speaker..

We get that, in view of the principle of constructing neural networks (one neuron - one property), the abstractions of ES and AS are part of the original ES and AS. Therefore, in order for generalization to occur, it is necessary that the EPs in the new chain are close enough to the EPs in the generalized one. More on “sufficient proximity” will be described later.

Due to the integration of the principle of generalization in neural networks, it turns out that in WTM it is not present as a separate mechanism, but is only part of the process of signal propagation over the network.

Stability of recognition to deformations. Situational context

Memorized reaction sequences incorporate associative bonds with different bond strengths. An extreme case is a regularity having all forces equal to 1. It will be perfectly reproduced in total from the appearance of 1 of its element. However, this is an extreme case. The sequences of the “middle band” are characterized by a different situation. They can only be effectively reproduced if environmental events exactly match them.

That is, for the normal use of memorized sequences, the current ones must coincide with them tact to the beat. This state of affairs is no good, and therefore WTM has a mechanism for situational context. The described problem is not the only one. The main types of deformation 3:

reordering events in a sequence.
the appearance of a sequence of new events between adjacent elements.
skipping sequence elements

So, the mechanism of situational context. It consists of two parts:

we modify each EP, adding its context to it
we change the ratio of the functions of memorization and forgetting in the direction of strengthening forgetting. This makes WTM remember less details. Compensation of the size of the electron beam.

A situational context is a concise description of nearby events. What time interval is considered nearby determines the developer. You can imagine it as a temporary storage, from which information about the oldest event is extracted at each WTM clock, and information about the new one is added.

The context is not heavily dependent on the order of events (implementation dependent). Thus, for a successful associative transition (read successful recognition), it is necessary not to repeat more details in the current ES, but to repeat the same previous events.

For sequences, this means that the further the reproduction of the sequence goes, the more likely it is that playback will continue.

Inertia

The situational context mechanism increases the inertia of WTM recognition. Recognition inertia refers to the WTM tendency to continue pattern recognition.

Context Groups

Depending on the relationship between the size of the context and the initial EP, the properties of WTM vary greatly. If the selected size of the situational context is larger than the size of the ES, then the WTM behavior will be more inert. The behavior of WTM will mainly consist of reaction sequences incorporating elements of the current context (if the interval of coverage of the context is large, then the rate of change is small, and the ratio of changes to the total size is negligible). Thus, we can say that the reaction sequences are divided into groups according to the general situational context. The division into groups is also supported by the fact that when reproducing the sequence of reactions from a certain group, we add this regularity to the context, thereby updating it, maintaining it in the same state.

Implementation

For all concepts considered in the theoretical part, descriptions of their implementation in terms of neural networks will be given.

General structure

WTM is a multilayer neural network with impulse neurons.
The network has both direct and feedback.
The nature of the bonds between the layers is local (i.e., not fully connected) both in the forward and backward directions.
Network operation is divided into beats. For one clock, the signal propagation between adjacent layers of the network is accepted.
Layers have integer numbers starting at 1.
Between neurons, a distance equal to the distance between the layers containing these neurons is set.
The network receives input signals after a predetermined period of time T (network operation period) T is measured in ticks.
Signals are fed to the network at a predetermined Tinput cycle frequency. Tinput is a multiple of T.
WTM uses modified impulse neurons. The activation function of neurons is threshold. When the threshold charge is exceeded, the neuron enters an active state. The duration of the active state is 1 beat.

Fig. 2 - Distribution of periodic activity over a network of 11 layers (T = 4, Tinput = 4)

And now a few definitions.

Due to the similarity of the process of signal propagation through the network with the waves, the model received a part of its name - wave. The “temporary memory” part was borrowed from the Jeff Hawkins HTM (temporary hierarchical memory) due to the similarities between the models.

In the theoretical part, the EP is represented as a static object. To understand WTM, this approach is suitable. In a WFM implementation, an ES is a dynamic object. That ES is extended in time, and at any moment of its period of duration only part of the ES is active.
ES in the process of their propagation through the network will be called waves. The “wave front” (a layer with neural activity) is the very active part of the EP.

EP memorization

EP is stored using the Hebb synaptic plasticity rule. Hebb's rule states that if the activity of one neuron is involved in the excitation of another neuron, then the strength of the synaptic connection between them should increase.

Associative relationship

AS between ES are created using feedbacks present in the network. AS connects two consecutive waves of activity. For this, the feedback length is chosen equal to T / 2 (network operation period). Synaptic connections that make up the speakers also obey the Hebb rule. The process of formation of speakers:

a signal was received at the input neurons, causing the propagation of an activity wave
after time T a second signal arrived, caused the propagation of an activity wave
from the first wave the feedback signal moves in the direction of the second wave
через время равное Т/2 от появления второй волны сигнал от первой волны и вторая волна будут находиться в слое с номером Т/2
сигнал от первой волны участвует в активации нейронов слоя Т/2.
происходит настройка синоптических связей по правилу Хебба
Данный процесс происходит на всем протяжении сети

If there is no second wave, only it will create activity after the first wave. In the theoretical part, this was called an associative transition (as well as recognition). The neural activity that arose as a result of the associative transition in the theoretical part was called EP created by an associative connection.

The power of associative communication. At the implementation level, this is a numerical value characterizing the ability of one wave of activity to recreate another wave through its feedbacks. The process of preserving reaction sequences consists in the continuous creation of associative links between successive waves of activity, as well as the conservation of the waves themselves.

The memory function is a function in accordance with which the values of the weight coefficients are increased in the process of setting them. Depends on the current synapse weight. The forgetting function is a function in accordance with which the values of weight coefficients decrease over time. Depends on the current synapse weight.

Behavior

Generalization is the process of transferring a behavioral reaction from one environmental event to another event, which is an abstraction of the first. At the level of the neural network, this means maintaining the strength of associative connections of some regularity at a sufficient level, when replacing its memory elements with some of their abstractions.

Situational context

A situational context is a compressed characteristic of nearby events, which is added to the current EP to increase the quality of generalization of patterns. For its implementation, an additional mechanism for the operation of neurons is introduced - a decrease in the activation threshold.

The mechanism for lowering the activation threshold: after the neuron is in an active state, the activation threshold of the neuron should be reduced. Over time, the threshold value returns to the initial value. The reduction occurs in accordance with the threshold reduction function. The value of the function depends on the current threshold value.

Consider an example. Suppose that a sequence of signals was applied to a WTM with a threshold reduction mechanism. After each wave, part of the neurons will decrease the activation threshold. This will lead to the fact that in the next waves there will be activity that would not be without reducing the threshold. This activity will be our brief characteristic of the situation.

The main thing is that when repeating the same signals, additional activity will also be repeated.

This implementation of the context mechanism was chosen because of its simplicity and one side effect, which corresponds to the goal of implementing the context mechanism as such.

Effect itself: after the propagation of a certain wave along the WTM, its repeated propagation is simplified. Moreover, the propagation of waves containing the same neurons (read from the same context group) is simplified. This effect can be called short-term memory WTM. It corresponds to the goal of introducing a context mechanism - to increase recognition inertia.

WTM Instance Plan

Based on the tasks set for WTM, to distinguish environmental events that require external reactions.
Highlight the functional basis of WTM.
Choose adequate (appropriate) reactions. These reactions will be called basic.
Create a WTM that matches the selected set of basic reactions.
Provide additional training for WTM using effects on the opamp through the medium.

Conclusion

This is currently an almost complete description of WTM. Further areas of work:

test applications of the model (they were before, but after that the model has undergone changes, so everything is new).
determination of patterns of selection of network characteristics (network length, number of neurons in a layer, number of direct and feedback connections, ...) and internal functions (memorization, forgetting, reduction of the activation threshold, ...). More precisely, their mutual relations, which would give the network the necessary properties (memory capacity, detail of allocated patterns, storage duration, generalization level, ...).
Adding network growth features to the WTM to overcome memory limitations.

I will be glad to constructive criticism, and generally knowledge and experience on this and related topics.

Tags: