Creating a bot to participate in the AI mini cup 2018 based on a recurrent neural network
Initially, I had no plans for the article, especially about speaking at a conference. But there was a conference. And after the speech on it, the viewers had questions to me regarding the implementation of some technical issues. So it happened word for word - article.
Record live broadcast below the link .
Disclaimer: This is an article trying to answer some of these questions. I would be happy if new questions arise, answering which will be an opportunity to learn something new myself. And yes, this article is not in any way an attempt to sense the ringing of copper pipes.
And so begin with the very beginning of this story. Spring 2018, mail.ru announced a programming contest based on the game Agairo . The essence of the competition is the creation of a game bot.
Competition rules for this link . But to be brief and describe the rules in your own words, then you need to create a bot to play Agairo. At one time, the popular Internet game (in the sense that it is launched from the browser), the essence of which is eating their smaller brothers and running away from larger ones. You control a colored circle in a limited two-dimensional space. To create gameplay in this space, in addition to your circle, there are other circles, managed respectively by other players. For growth there is food (colored dots), additional obstacles in the form of "viruses" are introduced; collision with which causes the division of your circle into smaller ones, which partially complicates the gameplay, since smaller ones eat larger and larger ones to be safer.
I will give a picture of the version of the game for the competition
Further, the player will be understood not as a bot developer, but as a bot.
The organizers of the contest as a whole retained the original concept of the game, but added a number of innovations from themselves. The main ones are the restriction of the review of the bot and the introduction of a simple physical law called inertia. Physics has long passed, therefore inertia is associated not only with us, but also with such quantities as mass, velocity and acceleration. We will not need anything more from the world of physics.
I hope the literary part has paid enough attention and it's time to move on to the technical component of the competition and participation in it.
Disclaimer 2: I am almost sure that what was described below could have been made thinner, sleeker and smarter, but how it happened it happened.
So write down the recipe and the main ingredients, they are not new, but you can add your own taste.
Task: Create a bot who knows how to play Agairo on his own.
The basic idea: Use neural networks (hereinafter referred to as neural network, neural network (NN)) as a controlling element of the bot.
The main production tools: Microsoft Visual Studio and c # with a smooth transition to c ++.
Then the most important thing for the author is to choose or understand his reader, for whom I am writing, whether he knows what neural networks are, genetic algorithms, and how to describe all this in detail.
Everything, I chose my reader, my reader knows all this, but not so good to use it in practice.
Therefore, in the course of writing this article I will recall simple forms such as neuron and the activation function.
For the visualization I will use personally drawn pictures, what should be read as: if you see any errors there, write, redraw to the correct ones.
So, everything is ready for sending the bot construction software to the path. Let's start:
We take the cycle, or rather the organizers of the competition have already prepared it for us as an example of the simplest bot. Let's go deeper, download from the repository for this link archive for the participant of the competition. The organizers promised that the contest server will live even after its completion, and we will be able to verify these words. We open archive and we select programming language, clear for us.
The documentation of the contest includes information about the client's communication technology (read our bot) and the server living somewhere in the depths of the data center. In short, the conversation goes through the console line, by writing and reading commands from it.
Here we must mention the magic word Serialization : it is the packing of your data in a format understandable to the serializer and the ability to read this data back into the program using the serializer. There are many serializers, the simplest sequential format for writing to the txt file is also a serializer, but simple. What we need to know about serializers is that they exist and are almost transparent in the API part. The organizers probably understood this and therefore the example of their bot contains all the necessary instructions for working with the serializer. In our case, it was JSON .
For a bot after the serializer, the game world looks like an array of data obtained at the beginning of the mentioned Cycle. Data about the game world is limited by the server according to the radius of the review equal to the 4th radius of the bot, remember that the bot is a circle. And the circle around the circle is the outer boundary of the review, although the outer circle is slightly offset in the direction of the bot's movement relative to the center of the bot, but in our case this can be neglected.
And so we have a Cycle, the Cycle measure is Tick, how many cycles have passed so many ticks ran. Most likely, we will no longer recall this tick, but it will introduce a number of important limitations and simplifications: time for calculations is limited not only by a single tick, the time of one iteration of the cycle, but also by the total time of calculations, beyond which the game server disconnects your bot from management. Tick is the time to give commands to the bot. We will consider it a quantum of Game Time, Tick is the indivisible amount of time in the game world. Pluses follow from this. Since the Tick is 1 by default, it is easier to count speeds, accelerations and other values. We in all formulas multiply by 1, instead of using in them some kind of time curve t = 0.0015 seconds. The errors of these operations are also minimal.
All of the above is somewhat like a movie, it seems that the picture on the screen is lively and mobile, but in fact we are shown 25 different shots 25 times a second and our brain just thinks it is sufficient frequency to fully understand the dynamics of the video. So the bot sees the world at intervals of time. Tick is a frame of the game world.
The neural network will also work according to Tikam.
The picture begins to clear from the bot:
The server sends the processed data about the world to the bot-> Start of the Tick-> bot receives the data-> processes them-> gives the server control commands to the bot-> The server reads the data-> End of the Tick and then in a circle or cycle almost infinite, 45000 ticks in the final the game is quite an important figure.
Important note: you can skip Tick and nothing will happen to you. There are cases of servers when every Tick needs to tell something to the server so that the server does not suspect the bot of the control program hanging.
The author decided not to touch the server, but a few words about him need to be said. On the server there is a model of the game world and in accordance with it, it makes calculations and actions. The server program includes the following blocks:
- calculation of bot collisions (handling collisions with the borders of the game world, who ate whom, bot bot or food bot, or collision with a virus, etc.)
- calculation of the physics of bots (movement of bots, change in bot speeds according to bot commands or collisions that have occurred)
- calculations related to the division of bots or their association, if possible, the addition of food and other functions to maintain the game world.
- and of course sending and receiving data bots on game Tikam.
This server is called "command". That is, all actions occur on the server side, which eliminates problems with data synchronization. The client (read the bot) only receives a ready-made snapshot of the world, like its other rivals, here equality of the bots' positions is achieved, there are no priorities and queues. Again, the change of the coordinates of the bot occurs on the server, for example, the bot gives a command to offset and only on the next Tick does it recognize its new coordinates, which do not always coincide with the command, as an option an obstacle in the form of the wall of the game world.
picture from the organizers website
We have sorted out the data reading by the bot and the managing server, now let's look into the depths of the bot device.
As the reader probably already understood there will be few words and a lot of work or vice versa.
The bot is controlled by a neural network. Short and concise, but not quite clear how.
The choice of a neural network and technology of its implementation.
At the moment we decide that the neural network for us is a black box that has an input and output.
The input and output for us will be one-dimensional arrays with dimensionality N = 16 on input and dimensionality M = 4 on output.
Since the nature of the artificial neural network is roughly copied from the natural neural network, it would be good to bring the structure of the input data to the natural one. In nature, various sensors are responsible for this. So we stick to bot sensors.
The following option was chosen experimentally (the picture is simplified to 8 sensors, and a 4x3 neural network, but this is only to confuse my reader):
The area of review of the bot is divided into equal (perhaps even unequal) sectors. Each sector sends a signal to one of the inputs of the neural network. Accordingly, if we divided the area around the bot into 16 sectors (360/12 = 22.5 degrees overview of the sector), we get 16 inputs of the neural network. Typically, the input of the neural network signal in the range from -1 to 1. Therefore, you will need to normalize the input signals.
Rationing is, simply, dividing the total signal by its maximum possible value. Of course, it is possible to do dynamic normalization, that is, every time we look for a maximum and bring the signal value to its norm, but we set the maximum value as a constant equal to the maximum number of objects in one sector and will not change it during the work on the bot.
Signals can be both positive and negative, let's agree that the signal value from 0 to 1 is a positive motive for our bot (food, food in the form of a smaller bot), a negative signal value from -1 to 0 is a negative motive for a bot (wall the gaming world, another larger bot).
A signal from a sector is a normalized value consisting of the sum of individual signals. Each individual signal has been said to have a sign and magnitude. The magnitude of the signal is determined as a proportional function of the distance. The minimum distance is the boundary of the circle defining the bot, the maximum distance is the limit of visibility of the bot. Accordingly, if we receive a positive signal from, for example, food that has only entered the sector, at a maximum distance, the signal will be weak and will increase if the bot approaches the food.
Sensors in the form of sectors most suited to this case, but there may simply be antennae-probes, here fantasy is on your side. Of course, the author tried various options for dividing into sectors from 4 sectors to 72, but practice has shown that the neural network is quite successfully trained in 16 sectors and is ready to control the bot, even if there are four sectors. This shows the flexibility of the very nature of the neural network.
At last, the word “ Learning of the neural network” was heard . But we have so far decided that the neural network is a black box with input and output, and since we have analyzed the input data, it remains to say a few words about the output data or signals from this box, what to do with them and how to process.
We set the dimension of the output signal to 4. This number is a manifestation of the dualism of Cartesian coordinates and polar ones. The author has not forgotten about the nature of his reader’s knowledge and, therefore, reminds him with his drawing of once firmly acquired knowledge.
So, server developers logically used polar coordinates to simulate the game world, and politely provided Cartesian for bots and their developers. Therefore, the author had to choose a side in this coordinate system. For a neural network, the polar coordinate system containing only the magnitude (read length) of the vector and its angle of deflection is more understandable. You see, the words “turn to the right” are also clearer than move two steps along the ordinate and three steps along the abscissa.
Therefore, 4 output signals are 2 signals of increasing or decreasing the velocity vector angle of the bot, and 2 more signals increase or decrease the magnitude of the velocity vector.
But since the game server asked to give commands to control the bot in Cartesian coordinates, a couple of formulas for transferring from polar coordinates to Cartesian and this whole mess rose in its place (this is similar to pseudocode with #).
float rotate1 = outputs; float rotate2 = outputs; float speed1 = outputs; float speed2 = outputs; if (rotate1 > 0.65 && rotate2 < 0.65) angle = angle + 35 * PI / 180; if (rotate1 < 0.65f && rotate2 > 0.65) angle = angle - 35 * PI / 180; if (speed1 > 0.65 && speed2 < 0.65) speed = speed + 2; if (speed1 < 0.65 && speed2 > 0.65) speed = speed - 2; dx = speed * Cos(angle); dy = speed * Sin(angle);
Signals are in place, like a bot. It seems to have given him the input signals, but at the output there is nothing worthwhile: either standing or moving randomly.
So it is time to open the black box and look inside.
To be continued.
But before the new series teaser:
The purple bot on the neural network plays Local Runner against standard bots that the organizers have stitched into it by default.