AI development for turn-based games on Node.js (part 1)


    Hello!
    It has been a whole year and a half since the writing of my first article on Habré . Since then, the FOTM project has undergone a number of changes. In the beginning, we will briefly go through all the upgrades, and then move on to a detailed analysis of the main feature - AI.

    In the first part of my story, I will tell you about the latest changes in the project and the attempt to implement AI using a neural network. And from the next article you will learn about the decision tree and plans for the future. So, let's begin!

    It would be nice to do ...


    After the release of the first article, I received the necessary feedback and set about implementing those things that seemed to me the most justified and not too complicated.

    • Chat Since the game is almost entirely built on sockets, creating a chat was quick and easy.
    • Fight in a draw. Now, after 50 moves, the game ends in a draw, so as not to drag the enemy’s time and nerves.
    • Rollers. My friend recorded some training videos on character customization and combat mechanics. The videos are now available on the YouTube channel of the game.
    • Gulp. Armed with the newly acquired knowledge about this collector, I lightened and slightly accelerated the client.
    • Moving logic to the server side. This task turned out to be very difficult and painstaking. Most of the mechanics were located exactly on the client side, which is fundamentally wrong. The main part of the feedback on the game concerned this particular aspect. Nevertheless, the transfer allowed not only to make the game less accessible for evil hackers, but also made it possible to refactor some terrible things.

    On the above improvements, it took me about six months of lazy work. As soon as the habraeffect was over, I realized that people who enter the game have no one to play with. They write in an empty chat, queue up for the arena and ... close the game. After all, if you are alone on the server, then there is no one to fight with :-(

    Neuro-fotm


    Frankly, I generally wanted to abandon the project, as it has already brought its fruits and experience. But I was always interested in machine learning, so I started searching for information on neural networks. After some time, I came to the conclusion: I need an environment for learning the network. My game with an unplowed field for creating learning bots was ideally suited for this.

    First, I implemented the function of forming a list of actions available in each turn. In total there can be 3:

    • Movement
    • Using ability
    • End of turn

    A resource is spent on each action - energy, the amount of which is limited per turn. Imagine that a player has 1000 energy at his disposal: moving costs 300 energy, using the “fireball” ability is 400. So, you can perform, for example, the following actions: Movement -> Movement -> Fireball -> End the turn. After this, the character's energy is replenished to the maximum, and the next player walks.

    For the sake of the experiment, I made the choice of the action random, to see how two reckless bots do all kinds of stupid things :)

    After that, the question arose how exactly the AI ​​should choose the action. I considered various options, but in the end I came to the next idea. The neural network receives normalized information about the situation on the battlefield and gives the following behavior models:

    1. Offensive move Moving to a cell is more advantageous in terms of attack (the point at which the maximum number of opponents is at the optimal distance for using attacking abilities)
    2. Defensive move Movement to the cage is more advantageous from the point of view of defense (the point at which the maximum number of allies is at the optimal distance for the use of protective abilities)
    3. Offensive Use an ability that deals damage to an enemy.
    4. Defensive Use of an ability that helps a character or ally survive: heals him or removes negative effects.
    5. Control Use of an ability that limits the enemy's actions: stunning, slowing down, inability to use abilities, etc.
    6. Gain Uses an ability that increases the characteristics of a character or ally.
    7. Weakening Use of an ability that reduces enemy stats.

    For the neural network model, I chose a perceptron that is simple enough to understand and suitable for my task.


    Multilayer perceptron

    At the input - an array of data about the situation. Consider a small slice of a sequence of values:

        let input = [..., 1, 1, 0.8, 1, 0.5, 0.86, ...];
    

    Six figures in the example show the ratio of the current health reserve to the maximum for 6 characters (active character, 2 allies and 3 opponents). At the output of the neural network, you can see the following:

        let output = [0.2, 0.1, 0.7, 0.45, 0, 0.01, 0.03]; 
    

    This is an array of the very 7 behaviors that I described above. Those. the neural network in this case assessed the situation and came to the conclusion that the best way is to attack the enemy (0.7), defend yourself or an ally (0.45), or simply go to the cell closer to the enemy (0.2).

    Each ability in the game has a separate useClass property that classifies it.


    The Prowler ability deals damage and stuns the enemy for 7 turns.

    For the Prowler ability, this property looks like this:

        useClass : {
            "offensiveMove" : 0,
            "defensiveMove" : 0,
            "offensive" : 1,
            "defensive" : 0,
            "control" : 1,
            "gain" : 0,
            "weakening" : 0,
        }
    

    And in the form of an array:

        let abilityUseClassArray = [0, 0, 1, 0, 1, 0, 0];
    

    To determine how the “Prowler” ability fits this model, I use the solution obtained by the neural network ( output ) and compare 2 linear arrays.

        let difference = 0;
        for( let i = 0; i < output.length; i++ ) {
            difference += Math.abs(output[i] - abilityUseClassArray[i]);
        }
    

    The lower the difference , the more likely it is to use this ability. If the arrays are completely identical (100% coincidence of the behavior and the properties of the useClass ability), the difference will be 0. Then all that remains is to choose the action for which the difference will be minimal.

    It seems that everything looks beautiful and clear, but there are a number of problems.

    Problem 1: Normalization


    To form an array of input data, it was necessary to normalize them in the range from 0 to 1. With the above-mentioned values ​​of the remainder of health, everything turned out to be quite easy. It is more difficult with inconsistent values, such as temporary effects superimposed on the character (buffs and debuffs). Each of them is an object with several important fields, such as the remaining time and effect multiplier (stacks). To make it clear to the neural network how one effect differs from another, I had to enter the same useClass field as for abilities. Thus, I was able to describe the effect, but the problem remained of their quantity. To do this, I took the number of buffs and debuffs imposed on the character and normalized it in the form:

        buffs.length / 42
    

    Such a solution practically does not tell the neural network about the properties of objects inside the buffs array. On average, 2-3 effects can hang on characters. It’s impossible to cross the 42 bar, since in battle there are only 6 characters and 7 abilities for each. As a result, the normalized description of the game situation is an array of about 500 values.
    One could make 42 sequences of values ​​to describe effects (when it is not there, fill it with zeros). But even if for each there are, say, 10 properties, then 420 values ​​will come out (and this is only for buffs). Therefore, I postponed this question for a while :)

    Problem 2: Learning Sample


    To create a training sample, I had to manually fill out the output values ​​for a number of situations. I implemented a UI that showed all the actions available this turn. The selected action was written to a separate JSON file as a solution (output) for a given set of input values ​​(input). For one batch, I managed to form about 500 input-output matches, which was a training sample. But the main question continued to hang in the air: how large should the sample be ?

    Moreover, if for some reason I decided to change the description of the situation (and it happened), then I would have to start all over again. For example, if the input data array will not consist of 520, but of 800 values, then the entire old selection can be thrown into the trash along with the network configuration.

    Problem 3: Network Architecture and Configuration


    So, we have about 520 values ​​in the array of input parameters and 7 values ​​at the output. To implement a neural network, I chose the Synaptic.js library , and implemented the network as follows:

        var network = new Architect.Perceptron(520, 300, 7); // input: 520, hidden: 300, output: 7
        var trainer = new Trainer(myNetwork);
        var trainingSet = [
            {
                input: [0, ... , 1], // input.length: 520
                output: [0, 0.2, 0.4, 0.5, 0.1, 0, 0] //output.length: 7 
            },
            ...
        ];
        trainer.train(trainingSet, {
    	rate: .1,
    	iterations: 10000,
    	error: .005,
    	shuffle: true,
    	log: 1000,
    	cost: Trainer.cost.CROSS_ENTROPY
        });
    

    This is what the first network configuration looked like. I started it and ... for 10,000 iterations of the neuron, I could not even get close to the set error value of 0.005, while spending 2 hours. I was thinking about what can be changed to achieve a given value. And I realized that everything is bad :(

    Let's consider the available configuration parameters:

    1. Sample size
    2. The number of hidden layers and the size of each of them
    3. Learning Speed ​​Coefficient
    4. Number of iterations
    5. Error value
    6. Evaluation function (3 options or you can write your own)

    It’s quite difficult to understand how each of them affects the result of work, especially if you have been doing neural networks for 2 weeks. If you make 1 hidden layer of only 10 neurons, then an error of 0.01 is achieved quite quickly (about 100 iterations), but suspicions about the flexibility of such a network creep in. Most likely, if you “feed” her a non-standard game situation, she will make a completely unacceptable decision.

    Problem 4: Workout Speed


    With the above configuration, network training lasted about two hours (approximately 1.38 iterations per second). This is quite a long time, given that you need to experiment with numbers to get the result. The problem was that the calculations were performed on the CPU (Intel Core i5-4570), and not on the video card. At this point, I wondered about porting computing to the GPU using CUDA. I shoveled a lot of material and came to the conclusion that the chances of setting up CUDA for Node.js on Windows are practically equal to 0. Yes, you can deploy a separate server on Linux that would only deal with network calculations. Try to write this server not in Node.js, but in Python and many other options. But what if the AI ​​variant built on a neural network is simply unacceptable for solving my problem?

    Problem 5: Features of game mechanics


    At the stage of network development, I met two more problems of the chosen approach to the implementation of AI.


    Description of Lets me take it

    1. Not all abilities can be attributed to one model of behavior. The most striking example is the oracle's Lets me take it ability. She “steals” a random positive effect from the enemy and applies to the one who used it. The problem is obvious - there are quite a few varieties of positive effects: some heal, others protect allies, still others strengthen combat characteristics, and some limit the character’s movement. If we steal the reinforcing effect, what kind of behavior will it be? Strengthening yourself ( gain ) or weakening the enemy ( weakening )? In fact, both. But the effect can also heal, therefore - this is already defensive behavior ; and if the enemy’s treatment was taken away, then the attack ( offensive) Thus, the ability “Lets me take it” falls under all behaviors. Which, of course, is very strange. This ability is far from the only one that has a random factor.
    2. Behavior is determined only for a specific situation. The decision about what is best done at the moment does not take into account the following actions of both the active player and the opponent's players. There is no modeling of situations and miscalculation of outcome outcomes.

    All the above problems made me doubt the correctness of the chosen approach to the development of AI. One work colleague, versed in machine learning, suggested using a decision tree instead of a neural network. We will talk about this in the next part of the article ...

    In the meantime, thank you for your attention!

    Also popular now: