The logic of thinking. Part 3. Perceptron, convolutional networks

    In the first part, we described the properties of neurons. The second talked about the basic properties associated with their learning. In the next part, we will go on to describe how the real brain works. But before that we need to make the last effort and take a little more theory. Now it most likely will not seem particularly interesting. Perhaps I myself would have minus such a training post. But all this "ABC" will greatly help us figure it out in the future.


    In machine learning, two main approaches are shared: learning with a teacher and learning without a teacher. The methods described above for highlighting the main components are teaching without a teacher. The neural network does not receive any explanation of what is fed to it at the input. It simply highlights those statistical patterns that are present in the input data stream. In contrast, teaching with a teacher suggests that for some of the input images called the training set, we know what output we want to get. Accordingly, the task is to configure the neural network in such a way as to catch the patterns that connect the input and output data.

    In 1958, Frank Rosenblatt described a construct he called the perceptron (Rosenblatt, 1958), which is capable of learning with a teacher (see KDPV).

    According to Rosenblatt, the perceptron consists of three layers of neurons. The first layer is the sensory elements that determine what we have at the input. The second layer is associative elements. Their relations with the sensor layer are rigidly defined and determine the transition to a more general associative picture of the description than on the sensor layer.

    The perceptron is trained by changing the weights of the neurons of the third reacting layer. The purpose of the training is to get the perceptron to correctly classify the submitted images.

    The neurons of the third layer act as threshold adders. Accordingly, the weights of each of them determine the parameters of a certain hyperplane. If there are linearly separable input signals, then output neurons just can act as their classifiers.

    If is the vector of the real output of the perceptron a,Is the vector that we expect to receive, then the error vector indicates the quality of the neural network:

    If you set a goal, minimize the standard error, you can derive the so-called delta rule for modifying weights:

    In this case, zero weights can be the initial approximation.
    This rule is nothing more than the Hebb rule applied to the case of the perceptron.
    If we place one or more reacting layers behind the output layer and abandon the associative layer, which was introduced by Rosenblatt more for biological reliability than due to computational necessity, then we will get a multilayer perceptron such as shown in the figure below.

    Multilayer perceptron with two hidden layers (Khaikin, 2006)

    If the neurons of the reacting layers were simple linear adders, then there would not be much sense in such complication. The output, regardless of the number of hidden layers, would still remain a linear combination of input signals. But since threshold adders are used in hidden layers, each such new layer breaks the chain of linearity and can carry its own interesting description.

    For a long time it was not clear how to train a multilayer perceptron. The main method - the method of back propagation of the error was described only in 1974 A.I. Galushkin and independently and simultaneously Paul J. Verbos. Then it was rediscovered and became widely known in 1986 (David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, 1986).

    The method consists of two passes: forward and reverse. With a direct pass, a training signal is supplied and the activity of all network nodes, including the activity of the output layer, is calculated. Subtracting the obtained activity from what was required to receive, an error signal is determined. With the return pass, the error signal propagates in the opposite direction, from output to input. In this case, synaptic weights are adjusted in order to minimize this error. A detailed description of the method can be found in many sources (for example, Khaikin, 2006).

    It is important for us to pay attention to the fact that in a multilayer perceptron information is processed from level to level. At the same time, each layer selects its own set of features characteristic of the input signal. This creates certain analogies with how information is transformed between zones of the cerebral cortex.

    Convolutional networks. Neocognitron

    Comparison of the multilayer perceptron and the real brain is very arbitrary. The general is that, rising from a zone to a zone in the crust or from layer to layer in the perceptron, information acquires an increasingly generalized description. However, the structure of the cortex is much more complicated than the organization of the layer of neurons in the perceptron. Studies of the visual system of D. Hubel and T. Wiesel made it possible to better understand the structure of the visual cortex and encouraged the use of this knowledge in neural networks. The main ideas that were used are the locality of the zones of perception and the division of neurons into functions within one layer.

    The locality of perception is already familiar to us, it means that the neuron that receives the information does not monitor the entire input space of the signals, but only its part. We said earlier that such a tracking area is called the receptive field of a neuron.

    The concept of receptive field requires a separate clarification. Traditionally, the receptor field of a neuron is called the space of receptors that affects the functioning of the neuron. Receptors here are neurons that directly perceive external signals. Imagine a neural network consisting of two layers, where the first layer is the receptor layer, and the second layer is the neurons connected to the receptors. For each neuron of the second layer, those receptors that have contact with it - this is its receptive field.

    Now take a complex multilayer network. The farther we go from the entrance, the more difficult it will be to indicate which receptors and how they affect the activity of deep-lying neurons. From a certain moment it may turn out that for any neuron all existing receptors may be called its receptive field. In such a situation, the receptive field of a neuron is to name only those neurons with which it has direct synaptic contact. To separate these concepts, we will call the space of input receptors - the initial receptive field. And the space of neurons that interacts with the neuron directly - a local receptive field or just a receptive field, without further clarification.

    The division of neurons into functions is associated with the detection of two main types of neurons in the primary visual cortex. Simple neurons respond to a stimulus located at a specific location in their original receptive field. Complex neurons are active on the stimulus, regardless of its position.

    For example, the figure below shows options for how sensitivity patterns of the initial receptive fields of simple cells may look. Positive areas activate such a neuron, negative ones suppress. For each simple neuron there is a stimulus that is most suitable for it and, accordingly, causes maximum activity. But it is important that this stimulus is rigidly tied to a position in the initial receptive field. The same stimulus, but shifted to the side, will not cause a simple neuron reaction.

    The initial receptive fields of a simple cell (Nicholls J., Martin R., Wallas B., Fuchs P.)

    Complex neurons also have their preferred stimulus, but they are able to recognize this stimulus regardless of its position in the initial receptive field.

    From these two ideas, corresponding models of neural networks were born. The first such network was created by Kunihik Fukushima. She got the name Cognitron. He later created a more advanced network, the neocognitron (Fukushima, 1980). Neocognitron is a construction of several layers. Each layer consists of simple (s) and complex (c) neurons.

    The task of a simple neuron is to monitor its receptive field and recognize the image on which it is trained. Simple neurons are assembled into groups (planes). Within one group, simple neurons are tuned to the same stimulus, but each neuron monitors its fragment of the receptive field. Together, they sort through all the possible positions of this image (figure below). All simple neurons of the same plane have the same weights, but different receptive fields. You can imagine the situation in another way, that it is one neuron that can try on its image immediately to all positions of the original image. All this allows you to recognize the same image regardless of its position.

    Receptive fields of simple cells configured to search for a selected pattern in different positions (Fukushima K., 2013)

    Each complex neuron monitors its plane of simple neurons and is triggered if at least one of the simple neurons in its plane is active (Figure below). The activity of a simple neuron suggests that he recognized a characteristic stimulus in that particular place, which is his receptive field. The activity of a complex neuron means that the same image is generally found on a layer that is monitored by simple neurons.

    Neocognitron planes

    Each layer after the input has a picture formed by the complex neurons of the previous layer with its input. From layer to layer there is an ever-increasing generalization of information, which as a result leads to the recognition of specific images regardless of their location in the original picture and some transformation.

    As applied to image analysis, this means that the first level recognizes lines at a certain angle passing through small receptive fields. He is able to detect all possible directions anywhere in the image. The next level detects possible combinations of elementary signs, defining more complex forms. And so on to that level until it is possible to determine the desired image (figure below).

    The recognition process in the neocognitron

    When used for handwriting recognition, such a design is resistant to the writing method. Recognition success is not affected by either surface movement or rotation, or deformation (tension or compression).

    The most significant difference between a neocognitron and a fully connected multilayer perceptron is a significantly smaller number of weights used with the same number of neurons. This is due to the “trick”, which allows the neocognitron to determine the images regardless of their position. The plane of simple cells is essentially one neuron whose weights determine the core of the convolution. This core is applied to the previous layer, running through it in all possible positions. Actually, the neurons of each plane and set their coordinates the coordinates of these positions. This leads to the fact that all neurons in the layer of simple cells monitor whether an image corresponding to the nucleus appears in their receptive field. That is, if such an image occurs somewhere in the input signal for this layer, this will be detected by at least one simple neuron and cause the activity of the corresponding complex neuron. This trick allows you to find a characteristic image in any place, wherever it appears. But we must remember that this is precisely a trick and it does not particularly correspond to the work of the real cortex.

    Neocognitron training occurs without a teacher. It corresponds to the previously described procedure for isolating a complete set of factors. When real images are input to the neocognitron, the neurons have no choice but to isolate the components characteristic of these images. So, if you submit handwritten numbers to the input, then the small receptive fields of simple neurons of the first layer will see lines, angles and conjugations. The size of the competition zones determines how many different factors can stand out in each spatial area. First of all, the most significant components are highlighted. For handwritten numbers, these will be lines at different angles. If free factors remain, then more complex elements can stand out.

    From layer to layer, the general principle of learning is preserved - factors that are characteristic of many input signals are highlighted. By submitting handwritten numbers to the first layer, at a certain level, we will obtain factors corresponding to these numbers. Each figure will turn out to be a combination of a stable set of features that will stand out as a separate factor. The last layer of the neocognitron contains as many neurons as there are images to be detected. The activity of one of the neurons of this layer indicates recognition of the corresponding image (figure below)

    Recognition in the neocognitron (Fukushima K., Neocognitron, 2007) The

    video below allows you to get a visual representation of the neocognitron.

    An alternative to learning without a teacher is learning with a teacher. So, in the example with numbers, we can not wait until the network itself identifies statistically stable forms, but tell her what kind of figure is presented to her and require appropriate training. The most significant results in such training for convolutional networks were achieved by Yan LeCun (Y. LeCun and Y. Bengio, 1995). He showed how the error back propagation method can be used to train networks whose architecture, like that of the neocognitron, vaguely resembles the structure of the cerebral cortex.

    A convolution network for handwriting recognition (Y. LeCun and Y. Bengio, 1995)

    In this we assume that the minimum initial information is reminded and we can go to things more interesting and surprising.


    Previous Parts:
    Part 1. Neuron
    Part 2. Factors

    Also popular now: