Neurons in 5 minutes

    Let me, in 5-10 minutes of reading and understanding a short article, add to your resume the lines “machine learning” and “neural networks”? To those who are far from programming, I will dispel all the myths about the complexity of AI and show that most of the projects on machine learning are based on extremely simple principles. Let's go - we have only five minutes.

    Consider the most basic example of neural networks - perceptrons; I myself only after this example fully realized how neural networks work, so if I don’t screw up and you can understand. Remember: there is no magic here, simple mathematics at the fifth grade level of high school.

    Suppose we have three different binary conditions (yes or no) and one binary solution at the output (yes or no):

    A simple model with three inputs and one conclusion. This model can work perfectly well for different people and give them different results, depending on how they trained the neural network. But what is a neural network? These are simply detached blocks - neurons interconnected. Let's create a simple neuron from three neurons:

    What you see between input and output are neurons. So far they have nothing to do with, but this reflects their main feature, which everyone forget to say: they are completely abstract shnyaga. That is, the neurons themselves do not solve anything at all, decides exactly what we draw next. For now, remember: neurons do nothing in neuron at all, except for organizing and simplifying the concept for humans. Let's draw the most important part of the neuron — connections:

    Wow, that already looks like something super cool. Now let's add some magic, somehow we will teach the neuron with our left heel, we will twist it in place, we will laugh, we will throw pepper over the right shoulder of the back neighbor - and everything will work, right? It turns out it's still easier.

    Each input on the left has a value: 0 or 1, yes or no. Let's add these values ​​to the input, let's assume that there will be no vodka at the party, there will be friends and it will rain:

    So, we figured it out. What do we do next? And here comes the fun: let's use the oldest way of setting the initial state of the neurons - the great random:

    The numbers that we set are the weights of the connections. Remember that neurons are empty bullshit? So, communication is exactly what the neural network consists of. But what are the weights of connections? These are the things that we multiply input values ​​and temporarily store in empty neurons. In fact, we do not store it, but for convenience, let us imagine that something can be put into neurons:

    How do you like math? Could produce multiplication? Wait, the most "difficult" has just begun! Then we add the values ​​(in one of the perceptron implementations):

    Well, that's all! The neuron is created, and you can use it for any needs. If the amount turns out to be more than 0.5 - you need to go to the party. If it is less or equal, you do not need to go to the party. Thanks for attention!

    Of course, the model above carries little practical benefit; we need to train it. The scary phrase “train neurons” - isn't it? Not this way. Everything is as clumsy and as simple as possible: you take random input data (as we did), drive input through these three neurons, look at the answer - let it be positive (go to the party) - and check whether the neuron correctly predicted the answer or not . If it is right, you do nothing. If it is wrong, you slightly shift the neuron weights (one by one or all at once) in any direction. For example, like this:

    And check again: oh, what, again, says to go to the party, when I do not want to go there! And again you move slightly the weights (in the same direction, most likely) a little bit, and again carry out these input data through the neurons, and again verify the result - and either leave the weights alone or move them again. And so trillions, quadrillions of times, but with all sorts of different input data. Of course, we have only 8 input combinations here, but there are different tasks (about them just below).

    This is where the basic principle of neural networks works is - both multiplication is necessary for differentiation, and an understanding of the perceptron's work is necessary for creating both convolutional networks, and recursive neurons, and what kind of exotic game.

    As a result, having trained the neuron on the decisions that any person made, having run billions of times over them, going through all possible neuron weights, you will finally arrive at the golden and optimal middle in such a way that the person will enter three initial values ​​- and the machine drives it already stable and working formula with three neurons and gives the answer.

    The only three unknowns in ours were the weights of the neuron connections, and it was them that we went through. That is why I say that neurons are empty-boards that do not solve anything, and the kings of the banquet are the weights of the connections.

    Then everything is simple: instead of one layer of neurons, we do two and again we sort everything out using exactly the same principles, only all the neurons give values ​​to other neurons. If at first we only had 3 connections, then now we have 3 + 9 links with weights. And then there are three layers, four, recursive layers, looped on themselves and the like game:

    But ask me, they say, but what about the result of something complex in neurons? Why pay so much machine learning specialists? And the thing is exactly how to implement the perceptrons above: there are so many different nuances that you will torture to list.

    What if at the entrance you have a picture and you need to categorize all the pictures on dogs and cats? Pictures are 512x512 in size, each pixel is input - are we how many of the neurons will drive the values? There are convolutional neurons for this! This is such a shnyaga that takes 9 pixels next to each other, for example, and averages their RGB values. It turns out, compresses the image for faster processing. Or, for example, generally gets rid of the red color in the picture, since it is not important (we are looking, for example, only green-blue dresses). These are convolutional networks - an additional layer of “neurons” at the input, which process the input for a clear and simplified form for the network.

    You also need to understand how much and in which direction the weights are shifted - for this there are all sorts of simple algorithms for understanding, which consider the error from the end - to the right (of the result) to the left (before the first layer of neurons) - one of the algorithms is called Back Propagation.

    There are all sorts of pretty damn simple algorithms for normalizing values ​​— so that when you add to the output or in the middle, when you add numbers not from 0 to 500,000, but from 0 to 1, it greatly simplifies calculations and computational mathematics.

    As you already could understand, really cool specialists in machine learning not only know most of the existing methods in building optimized neural networks, but also invent their own approaches, starting from the simplest, but deep understanding of the causal relationship between how to build a perceptron and why it works, from the point of view of mathematics. They can not just make the neuron work, they can change the algorithm or use another algorithm to still quickly, optimized run.

    Well, that's all - I gave you the foundation for understanding what neural networks are. I also hope that I showed you that the devil is not so terrible as he is painted - everything turned out to be incredibly simple, at the level of multiplication and addition. Then I advise you to start watching tutorials on YouTube or Udemy - there the fucking fucking guys explain everything.

    The next time when you are asked for money for a machine learning project, shake from the beggars the sketches of the neural networks - what layers, how are they organized, why and why is it like that and what’s wrong there. All of this will be at the level of, maximum, 11th class (this is about integrals and differentials) - and then it will appear in the description once, maybe two. So far, the project has no model (which layers and how are located) - the project has no product, because this structure is the first 2-4 weeks of a machine learning specialist.

    PS, an example for an explanation I brazenly pulled off from one magnificent video about neural networks. I strongly advise you to see - thanks to the guys! Subscribers helped to restore the link to the original video, an example from which I tried to recover from memory. If anyone is interested in how to code the problem above, then I invite you to watch this video here. Thank you very much to the authors!

    Also popular now: