What the neural network can and cannot do: a five-minute guide for beginners

    More than fifty years have passed since the description of the first artificial neuron by Warren McCallock and Walter Pitts. Since then, much has changed, and today neural network algorithms are used everywhere. Although neural networks are capable of much, researchers when working with them face a number of difficulties: from retraining to the problem of the “black box”.

    If the terms “catastrophic forgetfulness” and “regularization of scales” do not tell you anything yet, read on: let's try to figure everything out in order. / Photo Jun / CC-SA



    Why we love neural networks


    The main advantage of neural networks over other machine learning methods is that they can recognize deeper, sometimes unexpected patterns in data. In the learning process, neurons are able to respond to the information received in accordance with the principles of generalization, thereby solving the task assigned to them.

    The areas where networks are finding practical use right now include medicine (for example, cleaning instrument readings from noise, analyzing the effectiveness of treatment), the Internet (associative information search), economics (forecasting exchange rates, automatic trading), games (for example, go) and others. Neural networks can be used for almost anything because of their versatility. However, they are not a magic pill, and for them to start functioning properly, a lot of preliminary work is required.

    Neural Network Training 101


    One of the key elements of a neural network is the ability to learn. A neural network is an adaptive system that can change its internal structure based on incoming information. Typically, this effect is achieved by adjusting the values ​​of the weights .

    Connections between neurons on adjacent layers of a neural network are numbers that describe the significance of a signal between two neurons. If a trained neural network correctly responds to the input information, then there is no need to adjust weights, otherwise, using some training algorithm, we need to change weights, improving the result.

    As a rule, this is done using the back propagation method of error : for each of the training examples, weights are adjusted so as to reduce the error. It is believed that with a properly selected architecture and a sufficient set of training data, the network will sooner or later learn.

    There are several fundamentally different approaches to learning, in relation to the task. The first is training with a teacher . In this case, the input data are pairs: an object and its characteristic. This approach is used, for example, in image recognition: training is carried out on a marked base of pictures and manually placed labels of what is drawn on them.

    The most famous of these databases is ImageNet . With this formulation of the problem, learning is not much different from, for example, the recognition of emotions that Neurodata Lab deals with. The network demonstrates examples, it makes an assumption, and, depending on its correctness, weights are adjusted. The process is repeated until the accuracy increases to the desired values.

    The second option is training without a teacher . Typical tasks for it are clustering and some statements of the problem of searching for anomalies. In this situation, the true labels of training data are not available to us, but there is a need to search for patterns. Sometimes a similar approach is used to pre-train the network in the task of teaching with a teacher. The idea is that the initial approximation for weights is not a random solution, but already able to find patterns in the data.

    Well, the third option is reinforcement training- a strategy based on observations. Imagine a mouse running through a maze. If she turns to the left, she will get a piece of cheese, and if to the right - an electric shock. Over time, the mouse learns to turn only to the left. The neural network acts in exactly the same way, adjusting the weights if the final result is “painful”. Reinforcement training is actively used in robotics: “did the robot hit the wall or was it unharmed?” All tasks related to games, including the most famous of them - AlphaGo, are based on reinforced learning.

    Retraining: what is the problem and how to solve it


    The main problem of neural networks is retraining. It consists in the fact that the network “remembers” the answers instead of catching patterns in the data. Science has contributed to the emergence of several methods of combating retraining: this includes, for example, regularization, normalization of batches, data build-up, and others. Sometimes a retrained model is characterized by large absolute values ​​of the weights.

    The mechanism of this phenomenon is approximately the following: the initial data are often very multidimensional (one point from the training set is represented by a large set of numbers), and the probability that a randomly taken point is indistinguishable from the outlier will be the greater, the larger the dimension. Instead of “fitting” a new point into the existing model, adjusting the weights, the neural network seems to come up with an exception for itself: we classify this point according to one rule and the others according to others. And there are usually a lot of such points.

    An obvious way to deal with this kind of retraining is to regularize weights . It consists either in an artificial restriction on the values ​​of the weights, or in the addition of a fine as a measure of error at the training stage. This approach does not completely solve the problem, but more often it improves the result.

    The second method is based on limiting the output signal, and not the values ​​of the weights - we are talking about normalizing batches. At the training stage, data is supplied to the neural networks in batches. The output values ​​for them can be anything, and the more their absolute values ​​are, the higher the values ​​of the weights. If we subtract one value from each of them and divide the result into another, the same for the whole batch, then we will preserve the qualitative relations (the maximum, for example, will remain the maximum anyway), but the output will be more convenient for processing it with the next layer.

    The third approach does not always work. As already mentioned, a retrained neural network perceives many points as abnormal, which you want to process separately. The idea is to increase the training sampleso that the points are as if of the same nature as the original sample, but artificially generated. However, a large number of concomitant problems immediately arise here: the selection of parameters for increasing the sample, a critical increase in training time, and others.


    Effect of removing anomalous values ​​from the training data set ( source )

    The search for real anomalies in the training set is highlighted as a separate problem. Sometimes it is even considered as a separate task. The image above shows the effect of excluding the anomalous value from the set. In the case of neural networks, the situation will be similar. True, the search and exclusion of such values ​​is a non-trivial task. For this, special techniques are used - you can read more about them at the links ( here and here ).

    One network - one task or “the problem of catastrophic forgetfulness”


    Work in dynamically changing environments (for example, in financial) is difficult for neural networks. Even if you manage to successfully train the network, there are no guarantees that it will not stop working in the future. Financial markets are constantly transforming, so what worked yesterday can “break” with the same success today.

    Here, researchers either have to test various network architectures and choose the best one from them, or use dynamic neural networks. The latter “monitor” environmental changes and adjust their architecture in accordance with them. One of the algorithms used in this case is the MSO ( multi-swarm optimization ) method .

    Moreover, neural networks have a certain feature called catastrophic forgetting. It boils down to the fact that the neural network cannot be sequentially trained in several tasks - on each new training set, all the weights of the neurons will be rewritten, and past experience will be “forgotten”.

    Of course, scientists are working on a solution to this problem. Developers from DeepMind recently proposed a way to combat catastrophic forgetfulness, which consists in the fact that the most important weights in the neural network when performing a certain task A are artificially made more resistant to changes in the learning process on task B. The

    new approach is called Elastic Weight Consolidation ( elastic fastening weights) due to the analogy with an elastic spring. Technically, it is implemented as follows: each parameter in the neural network is assigned the parameter F, which determines its significance only within the framework of a specific task. The higher F is for a particular neuron, the more difficult it will be to change its weight when learning a new task. This allows the network to “memorize” key skills. The technology has yielded to “highly specialized” networks in individual tasks, but has shown its best side in the sum of all stages.

    Reinforced black box


    Another difficulty in working with neural networks is that ANNs are actually black boxes. Strictly speaking, apart from the result, you can’t get anything out of the neural network, not even statistics. It’s hard to understand how the network makes decisions. The only example where this is not the case is convolutional neural networks in recognition problems. In this case, some intermediate layers have the meaning of feature maps (one connection indicates whether a simple pattern has occurred in the original picture), so the excitation of various neurons can be traced.

    Of course, this nuance makes it quite difficult to use neural networks in applications when errors are critical. For example, fund managers cannot understand how a neural network makes decisions. This leads to the fact that it is impossible to correctly assess the risks of trading strategies. Similarly, banks that resort to neural networks to model credit risks cannot be able to say why this very client now has such a credit rating.

    Therefore, neural network developers are looking for ways to get around this limitation. For example, work is underway on the so-called rules of withdrawal algorithms ( rule-extraction algorithms) to increase the transparency of architectures. These algorithms extract information from neural networks either in the form of mathematical expressions and symbolic logic, or in the form of decision trees.

    Neural networks are just a tool


    Of course, artificial neural networks are actively helping to master new technologies and develop existing ones. Today at the peak of popularity is the programming of unmanned vehicles, in which neural networks in real time analyze the environment. From year to year, IBM Watson is discovering ever new application areas, including medicine . Google has a whole division that deals directly with artificial intelligence.

    However, sometimes there is a neural one - not the best way to solve a problem. For example, networks are “ lagging behind»In such areas as the creation of high-resolution images, the generation of human speech and in-depth analysis of video streams. Working with symbols and recursive structures is also not easy for neural systems. This is true for question-answer systems.

    Initially, the idea of ​​neural networks was to copy and even recreate the mechanisms of brain functioning. However, humanity still needs to solve the problem of the speed of neural networks, to develop new logical inference algorithms. Existing algorithms are at least 10 times inferior to the capabilities of the brain, which is unsatisfactory in many situations.

    At the same time, scientists are still not fully determinedin which direction neural networks should be developed. The industry is trying to bring neural networks as close as possible to the model of the human brain, and to generate technologies and conceptual schemes, abstracting from all “aspects of human nature”. Today it is something like an “open work” (to use the term Umberto Eco), where almost any experience is acceptable, and fantasies are acceptable.

    The activities of scientists and developers involved in neural networks require deep training, extensive knowledge, and the use of non-standard techniques, since the neural network itself is not a “silver bullet” that can solve any problems and tasks without human intervention. This is a comprehensive tool that can do amazing things in good hands. And he still has everything ahead.

    Also popular now: