
How does the brain work?
This post is based on a lecture by James Smith , a professor at the University of Wisconsin at Madison, who specializes in microelectronics and computer architecture.
The history of computer science as a whole comes down to the fact that scientists are trying to understand how the human brain works and recreate something similar in its capabilities. How exactly do scientists study it? Imagine that in the 21st century, aliens who have never seen the computers we are used to fly to Earth and try to explore the structure of such a computer. Most likely, they will start by measuring the voltages on the conductors, and find that the data is transmitted in binary form: the exact value of the voltage is not important, only its presence or absence is important. Then, perhaps, they will understand that all electronic circuits are composed of the same “logic gates” that have an input and an output, and the signal inside the circuit is always transmitted in one direction. If the aliens are smart enough, they can figure out how they work.combinational circuits- they alone are enough to build relatively sophisticated computing devices. Maybe the aliens will figure out the role of the clock and feedback; but they are unlikely to be able, by studying a modern processor, to recognize in it a von Neumann architecture with shared memory, a command counter, a set of registers, etc. The fact is that according to the results of forty years of the pursuit of performance in processors, a whole hierarchy of “memories” has appeared with ingenious synchronization protocols between them; several parallel pipelines equipped with transition predictors, so the concept of a “command counter” actually loses its meaning; each command has its own register contents, etc. To implement a microprocessor, several thousand transistors are sufficient; for its performance to reach our usual level, it takes hundreds of millions.
Neuron modeling
The cortex of the human brain consists of about one hundred billion neurons. Historically, scientists investigating the work of the brain tried to embody this colossal construction with their theory. The structure of the brain is described hierarchically: the cortex consists of lobes, lobes of “hypercolumns” , those of “mini-columns” ... A mini- column consists of about a hundred individual neurons.
By analogy with a computer device, the vast majority of these neurons are needed for speed and efficiency, for resistance to failures, etc .; but the basic principles of the arrangement of the brain are also impossible to detect with a microscope, just as it is impossible to detect a command counter by examining a microprocessor under a microscope. Therefore, a more fruitful approach is to try to understand the structure of the brain at the lowest level, at the level of individual neurons and their columns; and then, based on their properties, try to imagine how the whole brain could work. Something like this, aliens, understanding the operation of logic gates, could eventually make them the simplest processor, and make sure that it is equivalent in its capabilities to real processors, even though they are much more complex and powerful.
In the figure above, the body of the neuron (left) is a small red spot in the lower part; everything else - dendrites , “inputs” of the neuron, and one axon , “output”. Multi-colored points along dendrites are synapses by which a neuron is connected to the axons of other neurons. The operation of neurons is described very simply: when a “surge” occurs on the axon above a threshold level (a typical burst duration is 1ms, a level of 100mV), then the synapse “breaks through” and the voltage surge switches to a dendrite. In this case, the surge is “smoothed out": first, the voltage rises to about 1 mV in about 5..20 ms, then decays exponentially; thus, the burst duration extends to ~ 50ms.
If several synapses of one neuron are activated with a short time interval, then the “smoothed bursts” excited by each neuron in them add up. Finally, if quite a few synapses are active at the same time, the voltage on the neuron rises above the threshold level, and its own axon “breaks through” the synapses of the neurons associated with it.
The more powerful the initial bursts were, the faster the smoothed bursts grow, and the less will be the delay until the activation of the next neurons.
In addition, there are “inhibitory neurons”, the activation of which reduces the overall voltage on the associated neurons. Such inhibitory neurons are 15..25% of the total.
Each synapse has its own “resistance”, which lowers the incoming signal (in the example above, from 100mV to 1mV). This resistance dynamically adjusts: if the synapse is activated just beforeactivation of the axon - then, apparently, the signal from this synapse correlates well with the general conclusion, so that the resistance decreases, and the signal will make a greater contribution to the voltage on the neuron. If the synapse was activated immediately after activation of the axon, then, apparently, the signal from this synapse was not related to activation of the axon, so the synapse resistance increases. If two neurons are connected by several synapses with different delay times, then this adjustment of the resistance allows you to choose the optimal delay, or the optimal combination of delays: the signal begins to reach exactly when it is most useful.
Thus, the neuron model adopted by researchers of neural networks - with the only connection between a pair of neurons and with the instantaneous propagation of a signal from one neuron to another - is very far from the biological picture. In addition, traditional neural networks operate not with the time of individual bursts, but with their frequency: the more often the bursts at the inputs of a neuron, the more often there will be bursts at the output. Are the details of the neuron device that are discarded in the traditional model - are they essential or not essential to describe the functioning of the brain? Neuroscientists have accumulated an enormous amount of observations about the structure and behavior of neurons - but which of these observations shed light on the big picture, and which are just “implementation details”, and, like the predictor of transitions in the processor, do not affect anything other than performance? James believes that it is precisely the temporal characteristics of the interaction between neurons that allow us to come closer to understanding the issue; that asynchrony is just as important for brain function as synchrony is for computer operation.
Another “implementation detail” is the unreliability of the neuron: with some probability it can activate spontaneously, even if the sum of the stresses on its dendrites does not reach the threshold level. Due to this, the “training” of the neuron column can begin with any sufficiently large resistance at all synapses: initially, no combination of synapse activations will lead to axon activation; then spontaneous bursts will lead to a decrease in the resistance of synapses, which became active shortly before these spontaneous bursts. Thus, the neuron will begin to recognize specific “patterns” of input bursts. Most importantly, similar patternsthose on which the neuron was trained will also be recognized, but the burst on the axon will be weaker and / or later, the less the neuron is “sure” of the result. Learning a column of neurons is much more effective than training a regular neural network: a column of neurons does not need a control answer for the samples on which it is trained - in fact, it does not recognize , but classifies input patterns. In addition, neuron column training is localized- the change in synapse resistance depends on the behavior of only two neurons connected by it, and no others. As a result of this, training leads to a change in resistances along the signal path, whereas when training a neural network, weights change in the opposite direction: from the neurons closest to the exit to the neurons closest to the entrance.
For example, here is a column of neurons trained to recognize the burst pattern (8,6,1,6,3,2,5) - the values indicate the burst time at each of the inputs. As a result of training, the delays are tuned to exactly match the recognized pattern, so that the voltage on the axon caused by the correct pattern is the maximum possible (7):
The same column will respond to a similar input pattern (8,5,2,6,3,3,4) with a smaller burst (6), and the voltage reaches the threshold level much later:
Image recognition
To recognize handwritten digits from the MNIST database (28x28 pixels in shades of gray), James from the classification columns described above collected an analog of the five-layer “convolutional neural network”. Each of the 64 columns in the first layer processes a 5x5 pixel fragment from the original image; such fragments overlap. The columns of the second layer process four exits from the first layer each, which corresponds to a fragment of 8x8 pixels from the original image. In the third layer there are only four columns - each corresponds to a fragment of 16x16 pixels. The fourth layer - the final classifier - splits all images into 16 classes: the class is assigned in accordance with which of the neurons is activated first. Finally, the fifth layer is the classical perceptron, correlating 16 classes with 10 control answers.
Classical neural networks achieve accuracy of 99.5% and even higher on the basis of MNIST; but according to James, his "hypercolumn" is trained in a much smaller number of iterations, due to the fact that the changes propagate along the signal path, and therefore affect a smaller number of neurons. As for the classical neural network, the developer of the "hypercolumn" determines only the configuration of the connections between neurons, and all quantitative characteristics of the hypercolumn - i.e. resistance of synapses with different delays - are acquired automatically in the learning process. In addition, for the operation of a hypercolumn, an order of magnitude smaller number of neurons is required than for a similar neural network in capabilities. On the other hand, the simulation of such "analog neural circuits" on an electronic computer is somewhat complicated by the fact that, unlike digital circuits, working with discrete signals and with discrete time intervals - the continuity of voltage changes and the asynchrony of neurons are important for the operation of neural circuits. James claims that a 0.1ms simulation step is enough for his recognizer to work correctly; but he did not specify how much "real time" the training and work of the classical neural network took, and how much - the training and work of his simulator. He himself has long been retired, and he devotes his free time to improving his analogue neural circuits. and how much is the training and work of his simulator. He himself has long been retired, and he devotes his free time to improving his analogue neural circuits. and how much is the training and work of his simulator. He himself has long been retired, and he devotes his free time to improving his analogue neural circuits.
Summary from tyomitch : a model based on biological prerequisites is presented, quite simply arranged, and at the same time possessing interesting properties that radically distinguish it from conventional digital circuits and from neural networks. Perhaps such “analog neurons” will become the elemental base of future devices that can cope with a number of tasks - for example, with pattern recognition - no worse than the human brain; just like digital circuits have long surpassed the human brain in counting ability.