We write AI for Vindinium on single-board computers. Part 2: Decision Logic

A series of articles on writing AI for multiplayer online game of the bagel genre.

In this part of the article, we will look at approaches to creating logic for AI, talk a little about the goal-setting of each law-abiding bot, and determine the choice of programming language and write some code.

The game world Vindinium

In order to create an AI, you need to understand the device of the game world.

Free translation of game documentation

Description

Vindinium — многопользовательский пошаговый рогалик. У каждого из четырех игроков есть один герой, который может перемещаться по карте. Цель состоит в том, чтобы игроки собрали максимальное количество золота в течение заданного количества ходов (каждый игрок делает 300 ходов за игру, таким образом, вся игра состоит из 1200 ходов). Игроки должны взять под свой контроль золотые рудники для производства золота; однако рудники защищены гоблинами. Когда игрок побеждает гоблина, он становится владельцем рудника и получает одно золото за ход. Кроме того, гоблин теперь защищает рудник от других игроков.

Герои могут сражаться друг с другом. Выживший в бою получает контроль над всеми золотыми рудниками своего противника. Убитый герой немедленно возрождается со всем своим золотом, однако все рудники переходят в руки убийце.

Наведываясь в таверну, герои могут купить пиво за 2 единицы золота, таким образом восстанавливая свои очки здоровья.

Цель состоит в том, чтобы создать компьютерную программу (бота), которая играет в игру Vindinium как можно более разумно. Рекомендуется использовать один из стартовых наборов для большого числа языков программирования в качестве отправной точки.

Карта

Карты создаются случайным образом. Каждый игровой объект на карте кодируется с использованием двух символов. Пример карты:

+----------------------------------------+
|######$-    $-############$-    $-######|
|########        ##        ######|
|####[]    ####            ####    []####|
|##      ####  ##        ##  ####      ##|
|####            $-    $-            ####|
|##########  @1            @4  ##########|
|################    ####  ############|
|$-##$-        ############        $-##$-|
|  $-      $-################$-      $-  |
|        ########################        |
|        ########################        |
|  $-      $-################$-      $-  |
|$-##$-        ############        $-##$-|
|################    ####  ############|
|##########  @2            @3  ##########|
|####            $-    $-            ####|
|##      ####  ##        ##  ####      ##|
|####[]    ####            ####    []####|
|########        ##        ######|
|######$-    $-############$-    $-######|
+----------------------------------------+

Легенда

## — Непреодолимый лес
@1 — Первый герой
[] — Таверны
$- — Золотой рудник (ничейный)
$1 — Золотой рудник (принадлежащий первому герою)

Сгенерированные карты симметричны и всегда содержат 4 таверны и 4 героя.

Герой

Герои могут перемещаться на одну клетку за каждый ход и иметь следующие показатели:

Очки здоровья (HP): Каждый "свежий" игрок начинает с максимального значения = 100. Если HP падает до нуля, герой умирает (смотрите раздел "Смерть героя").
Золото: начиная с нуля, это показатель успешности героя. В конце игры герои будут оценены на основе их количества золота.
Количество золотых рудников.

Направление движения

Бот должен отдавать один приказ за ход. Возможные приказы: Стоять на месте (Stay), На север (North), На юг (South), На восток (East) или На запад (West). Как только приказ исполнен, герой остается на своем месте или перемещается на одну клетку в заданном направлении.

Перемещение героя

Если герой:

Пытается выйти за границы карты или пройти сквозь деревья, ничего не происходит.
Заходит в золотой рудник, он остается на месте и:
- Если рудник уже принадлежит герою, ничего не происходит.
- Если шахта ничейная или принадлежит другому герою, происходит бой с гоблином-стражем, охраняющим шахту. Герой теряет 20 очков жизни. Если он выживет, шахта его.
Наступает на другого героя, он остается на месте и ничего не происходит. Поединки героев решаются в конце хода.
Заходит в таверну, он остается на месте и заказывает себе поесть. Герой платит 2 золота и восстанавливает 50 единиц здоровья. Обратите внимание, что количество здоровья не может превышать 100 единиц.
Не отправляет приказ за отведенное ему время для принятия решения (1 секунду), он остается на месте до окончания игры, отправлять новые приказы становится невозможно. Обратите внимание, что он все равно может выиграть, если в конце игры у него больше золота, чем у других игроков.

Конец хода

После того, как герой переместился (или решил остаться на месте), произойдут следующие вещи:

Сражения

Герои немного нервничают и никогда не упускают возможности ударить друг друга большими мечами. В конце хода героя, если есть враг на расстоянии одного квадрата в любом направлении, герой его атакует. Например, в этой ситуации, в конце хода первого героя (@1):

##########@1@2####  @3##########

Игрок 1 атакует второго игрока, но не трогает третьего, потому что третий стоит на расстоянии двух клеток от него.
Нападающий не теряет единиц здоровья, обороняющийся теряет 20 единиц.
Если обороняющийся умирает (см .: Смерть героя), нападающий получает контроль над всеми золотыми рудниками проигравшего.

Добыча золота

После своего хода и сражений с другими героями (если таковые были), игрок получает одну единицу золота за каждый подконтрольный рудник.

Жажда

Затем герой теряет одну единицу здоровья, ибо любое действие вызывает у него жажду.
Обратите внимание, что герои не могут умереть от жажды. В худшем случае, значение их здоровья падает до единицы.

Смерть героя

Когда здоровье героя падает до нуля, он умирает. Герой немедленно появляется на карте на своей точке возрождения, с полным запасом здоровья (100 единиц). Герой теряет контроль над всеми своими золотыми рудниками, но сохраняет все свое накопленное золото. Будьте осторожны, когда герой возвращается на точку возрождения, любой противник, который находится в этой клетке, автоматически умирает. Таким образом, вам следует избегать пребывания на клетке возрождения одного из героев ...

Герой не может умереть от жажды. Жажда может оставить героя с одной единицей здоровья, но не убить его.

Конец игры

Игра заканчивается, когда достигается максимальное количество ходов(обычно 300). Победителем является герой с наибольшим количеством золота. Если у двух игроков одинаковое количество золота, победителя нет.

Рейтинг

Система оценки относительной силы игроков использует Рейтинг Эло. Идея такова: лучше быть первым, чем вторым, лучше быть вторым, чем третьим, и так далее. Надеюсь, принцип понятен.

Использование нескольких ботов одновременно

Вы можете одновременно запускать несколько экземпляров ваших ботов и, в общем-то, использовать любые меры, которые, по вашему мнению, подходят для достижения доминирующего лидерства. Боритесь!

Link to the original

It is worth noting a couple of aspects that were not described in the rules, but identified empirically:

If we have less than 21 health units, but you attack a mine that does not belong to you, then you die. Yes, yes, there is no protection from a fool; everything is serious here, as in most real battles. If you attack a nobody's mine, all your mines become no man’s, and if you mine one of the enemies, then your mines pass into the hands of the player who owns this mine.
The game describes the following procedure: Исполнение приказа- Бьем ближайших врагов- Теряем 1 единицу здоровья от жажды. And what happens if during the execution of the order we die (in the game you can do this only by dying in a battle with the goblin)? We are reborn (and instantly kill the player who is standing now at our spawnpoint), but lose the opportunity to hit the closest enemies, and also do not lose 1 health point due to thirst.
After killing an opponent standing on our spawnpoint during our revival, we capture his mines, hehe.
The map has a square view, the length of the map takes on even values on the segment [8, 28].

"Learn from your enemies and you will understand their strengths."

Vindinium is a public game, its useful side is that we can look in the profile of any player and see the last hundred battles with his participation. "Great! It's time to use neural networks, because we have 50 top players, take 10 of them to be the strongest, each of the last 100 fights contains ~ 300 moments when the player had to make a decision, about 200-300 thousand units in total material for training! And you can also rotate every situation clockwise, mirror, etc, to get even more material for training and fix the result, it will give us as much as 4.8-7.2 million units of material "- the voice of reason rang out. Yes, indeed, such an idea has the right to exist. In addition, neural networks have many advantages.

All material for learning is easily parsed from open source.
A wide scope for reflection on computer vision is revealed:
- You can leave everything as it is, there will be 28 * 28 input neurons (if the card is smaller, we fill it with trees);
- You can center each time according to the position of the hero (maybe it will bring some amazing result);
- It is possible to present the map as a graph, thus the work of the neural network to find patterns is greatly facilitated; This option will allow the neuron to quickly find patterns of complex behavior and quickly understand why, if we have little health, we go to a far tavern, if there is another tavern just a couple of cells from us, even if it is close to the enemy;
An already trained neural network, if you pre-set the task of resource consumption, can be compactly placed in 512 MB of RAM allocated for us (in fact, it turns out about 480 MB), so much so that the capacity of a single-board computer is sufficient for calculations.

However, adolescent maximalism in me wants to go a more complicated way - not to impose a search for patterns on the neural network, but to do this work independently, head on, relying on the intuitive higher plasticity of this solution.

So. Decision trees, alpha-beta pruning, minimax ... too resource-intensive tasks! On the Vindinium subreddit, several developers, revealing the veil of secrets of their bots, have already used this solution, and certainly not in such Spartan conditions. Unfortunately, in this area it is unlikely to be able to do anything better than the rest.

Having read articles about evolutionary, genetic algorithms, decisive trees, I dug out secret knowledge - potential fields. Read more about them here and here . This idea seemed very working, because the potential field is a planar graph, a function depending on the input data is placed in each link (in particular, the distance from the object, but no one bothers to make more conditions). All this fits perfectly into the realities of Vindinium - you do not need to search for the path to the object, if it is already embedded in the algorithm.

"Quite specific tastes"

Let's watch the battles of top characters. Before we start, we will choose a favorite, we will follow him, cheer for him, reproach for the wrong decisions in the style of "but I would do that at this place ...". After a dozen battles, it is already possible to make a first sketch of what a law-abiding AI is (conditions are checked in order):

You should not walk near the enemy's spawnpoint, if the enemy has a chance to die (that is, if we can expect an inglorious death while standing at the enemy's spawnpoint);
It is foolish to fight with your enemy near his spawnpoint, for he still aki phoenix will be reborn with full health and again will try to capture our honestly looted mines;
If the enemy is close to us, and we stand near the tavern - time to get drunk. Judging by the many bloody battles near the means of subsistence and relaxation, this rule is very important;
If we can not defeat the enemy / enemies, but we have time to reach the tavern - we run;
If we can not defeat the enemy (s) And do not have time to reach the tavern, then:
- If we can commit suicide on a farm, we kill ourselves. Take a bite!
- If we can die on the miner's file of the person with the lowest amount of gold, we will get rid of it;
- If a sad end awaits us, then we need to take away as much health as possible from this reptile, let it be long to remember our mistake!
If there is an enemy whom we can kill within our two moves and he has minilki - we attack;
If there is an enemy, removed from all mine-feeds more than we, and he has 33% of mine-controls under control. And we can defeat him - we go to win, otherwise we go to drink beer;
Capture the farm, if nothing else remains.

Question answer:

What are its advantages in comparison with neural networks, which will cope with this task a hundred times better, or with trees that all your next n moves in advance know and have already developed countermeasures, all that remains is a good evaluation function to apply?
(1) Multifunctional. It is easier to change the parameters, add new features. You follow your character like this, you are happy, and then bang - and you see that at a certain moment you could have done something completely different, more prudent, - we write a new rule or change the old one. (2) We also know exactly what decision the program was guided by when choosing a particular course. (3) Potential fields have shown themselves well in bagels as the basis for artificial intelligence bots.
Prove that your approach is effective, that your intentions are worth something.
In the leaderboard on the 27th place hangs Zaraza 0.1- AI on potential fields, which is guided by only three instincts - mindlessly seize everything that gets in its path, do not dry out in bars and carefully behave with enemies. If you follow his movements, you will see how well he is fighting, although it is simply unbelievable for AI, which is based on three simple rules, and even complex dreams will not appear to him even in dreams. Moreover, I am currently working onZonko 0.11, which is a much improved version of the Zaraz's booze, you can embed a much more complex behavior into it due to improved interaction with the fields - just like in the new-fashioned GPS. But, as it turned out, it is gluttonous to the resources, so now the process of its optimization is going on ... But I digress this, now we are talking about strict restrictions, strict rules of strict (...).
Your beliefs are ridiculous, your faith is too weak! I can create an AI in the name of the method, and it will tear you!
It will be very nice to listen to other people's thoughts on this topic. Moreover, for you I have already added up all the battles of the top 10 players, only 1000 battles and about 1,000,000 moves - link (.zip - 33MB, RAW - 1.68GB). I suggest the conditions of the game:
- Register bots under their nicknames in geektimes.
- The five players who scored the most points before September 30, than me or anyone else among those who have expressed to play, I will send a postcard from Moscow).

So, now the programming language ... Personally, I’m rushing around between Python3 (fast development, easy to read, have been familiar with it for a long time, there is pypy3 (fast optimized interpreter), jupyter ("notebooks" in which you can safely write pieces of code and optimize them to infinity); but pypy / pypy3 does not work under ARM 64bit, and indeed ARM is no longer supported, and the language itself is inferior to compiled because of its nature) and Golang (also fast development, easily understood, large backend bias, multithreading and multiprocessing, runs faster than python; but when etsya to get used to the absence of an interactive environment to static typing).

The main function that communicates with the server can be represented as follows:

Code

# в глобалях находятся переменные train_url, arena_url, userkey, добытые из config.pyfrom config import train_url, arena_url, userkey
import requests, random, json, time
defstart(is_train = True, debug = True, show_decision = True):# Получаем информациюif is_train:
        r = requests.post(train_url, data={"key":userkey})
    else:
        r = requests.post(arena_url, data={"key":userkey})
    timer = time.time()
    data = json.loads(r.text)
    if debug or show_decision:
        print('viewUrl:', data['viewUrl'])
        print('Размер карты:', data['game']['board']['size'])
    #циклwhileTrue:
        if debug:
            print('Turn', data['game']['turn'])
        # Вызываем функцию принятия решения
        direction = random.choice(['North', 'South', 'East', 'West', 'Stay'])
        if show_decision or debug:
            print('Решение хода',str(data['game']['turn'])+':', direction)
        # Возвращаем ответ на сервер, проверяем коды состояния, завершаем игру.if debug:
            print('Время:',time.time()-timer)
        r = requests.post(data['playUrl'], data={'key': userkey, 'dir': direction})
        timer = time.time()
        if r.status_code != 200:
            print('Request code :', r.status_code)
            print('Reason:', r.reason)
            break
        data = json.loads(r.text)
        if data['game']['finished']:
            print('Game finished.')
            break

But it is recommended to use ready-made designs, links to which can be found on the official website of Vindinium.

Extra 1: I really want to read about the development of artificial intelligence based on Vindinium from other people, because this is how one can understand the many facets of solving this problem. In order to get a summary of the battle in json format (this can be useful for debugging the battles), you need to convert the link to the fight http://vindinium.org/fd96vc2z into a link like http://vindinium.org/events/fd96vc2z . But I do not advise you to torment the game server, trying to get hundreds of battles of top players, use the link above.

Extra 2: If someone wants to try out their work in Vindinium, drive them into the limitations of NanoPi Neo2 or Orange Pi Zero, I can provide the opportunity to work with these single-board computers.

→ Link to Vindinium
→ Link to the Vindinium subreddium is a very useful thing, there you can track my movements on the Vindinium
→ Link to my github with some insights on Vindinium

In the next part we will customize potential fields, work with potential maps, write conditions and impose all this on modern realities.

Tags: