Using a multilayer neural network to circumvent obstacles in games
Finding ways to circumvent obstacles in games is a classic task that all computer game developers have to face. There are a number of well-known algorithms of varying degrees of efficiency. All of them in varying degrees, analyze the mutual location of the obstacles and the player, and the results make a decision on the movement. I tried to use a trained neural network to solve the obstacle avoidance problem. I want to share my experience of implementing this approach in the Unity3D environment in this short article.
Concept
The landscape based on the standard Terrain is used as a gaming space. Collisions with the surface in this article are not considered. Each model is equipped with a set of colliders that describe the geometry of the obstacles as accurately as possible. The model, which should bypass obstacles, has four
collision sensor (in the screenshot, the location and distance of sensors are indicated by turquoise lines). In fact, the sensors are rakes, each of which transmits the distance to the object-collision to the analysis algorithm. The distance varies from 0 (the object is located as close as possible) to 1 (no collision, this direction is free from obstacles).
In general, the work of the obstacle avoidance algorithm is as follows:
- Four values from the collision sensors are fed to the four inputs of the trained neural network.
- The state of the neural network is calculated. At the output we get three values:
a. The force of rotation of the model counterclockwise (takes a value from 0 to 1)
b. The rotational force of the model clockwise (takes a value from 0 to 1)
c. Braking acceleration (takes a value from 0 to 1) - Efforts are made to the model with appropriate coefficients.
Implementation
Honestly, I had no idea if anything would come of this idea. First of all, I implemented the neuroNet class in Unity. I will not dwell on the code of the class, since it is a classic multilayer perceptron. Along the way, the question immediately arose in the number of layers of the network. How many of them are required, on the one hand, to provide the necessary capacity, and on the other - an acceptable calculation speed? After a series of experiments, I stopped at twelve layers (three basic states for four inputs).
Next, it was necessary to implement the process of learning the neural network. For this, we had to create a separate application where the same neuroNet class was used. And now there was a problem of data for training. Initially, I wanted to use the values obtained directly from the game application. For this, I organized the logging of data from the sensors, so that in the future for each set of values of the four sensors I would indicate to the training program the correct values of the outputs. But, looking at the resulting result, I became disheartened. The fact is that it is not enough for each set of four sensor values to indicate an adequate value; it is necessary that these values be consistent. This is very important for successful learning of the neural network. In addition, there was no guarantee that the resulting sample represented all possible situations.
An alternative solution was a manually compiled table of basic options for the values of sensors and outputs. The following values were taken as the basic variants: 0.01 - an obstacle is close, 0.5 - an obstacle halfway, 1 - a direction is free. This reduced the size of the training sample.
Sensor 1 | Sensor 2 | Sensor 3 | Sensor 4 | Clockwise | Counterclockwise | Braking |
---|---|---|---|---|---|---|
0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 | 0.01 |
0.01 | 0.01 | 0.01 | 0.5 | 0.01 | 0.01 | 0.01 |
0.01 | 0.01 | 0.01 | 0.999 | 0.01 | 0.01 | 0.01 |
0.01 | 0.01 | 0.5 | 0.01 | 0.999 | 0.01 | 0.01 |
0.01 | 0.01 | 0.5 | 0.5 | 0.999 | 0.01 | 0.01 |
0.01 | 0.01 | 0.5 | 0.999 | 0.999 | 0.01 | 0.5 |
0.01 | 0.01 | 0.999 | 0.01 | 0.999 | 0.01 | 0.5 |
0.01 | 0.01 | 0.999 | 0.5 | 0.999 | 0.01 | 0.999 |
0.01 | 0.01 | 0.999 | 0.999 | 0.999 | 0.01 | 0.999 |
The table shows a small fragment of the training set (total in table 81-a row). The end result of the training program was a table of weights, which was saved in a separate file.
results
In anticipation of rubbing my hands, I organized the loading of the coefficients into the demo game and started the process. But as it turned out, I did for the business clearly not enough. From the start, the tested model spun, stuck into all the obstacles in a row, like a blind kitten. In general, the result was very so-so. I had to go into the study of the problem. The source of helpless behavior was discovered rather quickly. When, in general, the correct reaction of the neural network to the sensor readings, the transmitted control actions turned out to be too strong.
Having solved this problem, I encountered a new difficulty - the distance of the sensor rakest. With a large distance of interference detection, the model made premature maneuvers, which resulted in significant distortions of the route (and even in unforeseen collisions into seemingly past obstacles). A small distance led to one - helpless "sticking" the model into all obstacles with a clear lack of time to respond.
The more I fiddled with the model of the demo game, trying to teach her to avoid obstacles, the more it seemed to me that I was not programming, but trying to teach the child to walk. And it was an unusual feeling! It was all the more joyful to see that my efforts bear tangible fruit. In the end, the unfortunate ship-hover, hovering above the surface, began to fairly confidently go around the structures that appeared on the route. The real tests for the algorithm began when I deliberately tried to drive the model to a standstill. Here it was necessary to change the logic of work with braking acceleration, to make some amendments to the training sample. Let's look at practical examples of what happened as a result.
1. Simple bypassing one obstacle
As you can see, the bypass did not cause any difficulties.
2. Two obstacles (option 1)
The model easily found a passage between the two buildings. Easy task.
3. Two obstacles (option 2)
The buildings are closer, but the model finds a passage.
4. Two obstacles (option 3)
The option is more difficult, but still deciding.
5. Three obstacles
The task was solved rather quickly.
6. Deadlock
Here the model has problems. In the first 30 seconds of the video, it is shown that the model flounters helplessly in a simple building configuration. The problem here most likely lies not so much in the neural network model, as in the main algorithm for moving along the route - he is persistently trying to return the ship to the course, despite desperate attempts to avoid collisions.
After several unsuccessful runs of this situation with different parameters, it was possible to get a positive result. From the thirtieth second of the video, one can observe how a model with an increased distance of sensors and with a more powerful braking force is selected from a dead end. For this, it took almost five minutes of time (I cut out the torment and left only the last 30 seconds of the video). This is unlikely to be considered a good result in a real game, so there is clearly room for improvement in the algorithm.
Conclusion
In general, the problem was solved. How effective this solution is is an open question, and more research is needed. For example, it is not known how the model will behave when dynamic obstacles (other moving objects) appear. Another problem is the lack of rear-facing collision sensors, which leads to difficulties in circumventing complex obstacles.
Obvious further development of the idea of a neural network obstacle avoidance algorithm seems to me in the implementation of training. To do this, you must enter an assessment of the result of the decision, and with successive corrections without a significant change in the position of the object, the assessment should deteriorate. Upon reaching a certain value, the model must go into the training mode and, let's say, randomly change the decisions made in order to find a way out.
Another feature of the model seems to me the variability of the initial training. This makes it possible, for example, to have several behaviors for different models without the need to program each one separately. In other words, if we have, say, a heavy tank and a light reconnaissance aircraft, the manner in which they can avoid obstacles can differ significantly. To achieve this effect, we use the same perceptron, but trained on different samples.