Overview of Techniques for Implementing a Gaming AI

Transfer

Introduction

This article will introduce you to a wide range of concepts of artificial intelligence in games ("game AI"), so that you understand what tools you can use to solve AI problems, how they work together and where to start their implementation in the selected engine.

I will assume that you are familiar with video games, a little versed in such mathematical concepts as geometry, trigonometry, etc. Most of the code examples will be written in pseudocode, so you do not need to know a particular language.

What is a "game AI"?

Game AI mainly deals with the choice of actions of the entity, depending on current conditions. In the traditional literature on AI, it refers to management as " intelligent agents ." The agent is usually a game character, but it can be a machine, a robot, or even something more abstract — a whole group of entities, a country, or a civilization. In any case, it is an object that monitors its surroundings, makes decisions based on it and acts in accordance with these decisions. This is sometimes called the perception-thinking-action cycle (Sense / Think / Act):

Perception: the agent recognizes - or is informed about - information about the environment that may influence his behavior (for example, nearby hazards, collected items, important points, etc.)
Thinking: the agent decides how to respond (for example, decides whether it is safe enough to collect items, whether to fight or better to hide)
Action: the agent performs actions to implement his decisions (for example, begins to move along the route to the enemy or to the object, and so on)
... then, due to the actions of the characters, the situation changes, so the cycle must be repeated with new data.

The tasks of real-world AI, especially those that are relevant today, are usually focused on “perception”. For example, unmanned vehicles should receive images of the road in front of them, combining them with other data (radar and lidar) and trying to interpret what they see. Usually, this task is solved by machine learning, which works particularly well with large arrays of noisy real-world data (for example, with photos of the road in front of a car or several video frames) and gives them some meaning by extracting semantic information, for example, “there are 20 yards in front of me one more car. ” Such tasks are called classification tasks .

Games are unusual in that they do not need a complex system to extract this information, since it is an integral part of the simulation. There is no need to run image recognition algorithms to detect the enemy in front of you; the game knows that there is an enemy there and can pass this information directly to the decision-making process. Therefore, the “perception” in this cycle is usually greatly simplified, and all the complexity arises in the implementation of “thinking” and “action”.

Game development limitations

Game AI usually takes into account the following limitations:

Unlike the machine learning algorithm, it usually does not train in advance; When developing a game, it is not practical to write a neural network to monitor tens of thousands of players in order to find the best way to play against them, because the game is not yet released and there are no players!
It is usually assumed that the game should entertain and challenge the player, and not be “optimal” - therefore, even if it is possible to train agents to resist players in the best way, then most often designers need to perfect something different from them.
Often, agents are required to "realistic" behavior, so that players feel that they are competing with human-like opponents. The AlphaGo program turned out to be much better than people, but the moves it chooses are so far from the traditional understanding of the game that experienced opponents spoke of it as a game against an alien. If the game simulates an adversary-person, then this is usually undesirable, so the algorithm must be configured so that it makes plausible decisions, rather than ideal ones .
AI must be run "in real time". In this context, this means that the algorithm cannot monopolize processor resources for a long time to decide. Even 10 milliseconds to make a decision is too much, because most games have only 16-33 milliseconds to perform all operations for the next frame of graphics.
Ideally, at least part of the system should depend on the data, and not be rigidly defined so that non-programmers can make changes faster.

Having learned all this, we can begin to consider extremely simple approaches to creating AIs that implement the entire perception-thinking-action cycle in ways that ensure efficiency and allow game designers to choose complex behaviors that are similar to human actions.

Simple decision making

Let's start with a very simple game, for example with Pong. The player's task is to move the "racket" so that the ball bounces off of it, rather than flying past. The rules are similar to tennis - you lose if you missed the ball. AI has a relatively simple task of making decisions about choosing the direction of the racket movement.

Rigid conditional constructions

If we wanted to write an AI to control the racket, then there is an intuitive and simple solution - just constantly move the racket so that it is under the ball. When the ball reaches the racket, it is already in an ideal position and can beat it off.

A simple algorithm for this, expressed in pseudocode, can be:

in each frame / update while the game is running:
if the ball is to the left of the racket:
	move the racket to the left
otherwise if the ball is to the right of the racket:
	move the racket to the right

If we assume that the racket can move with no less speed than the ball, then this will be the ideal algorithm for the AI player in Pong. In cases where there is not much “perception” data for processing and few actions that an agent can perform, we do not need anything more complicated.

This approach is so simple that it can barely see the whole perception-thinking-action cycle. But he is .

“Perception” is the two if statements. The game knows where the ball and racket are. Therefore, the AI asks the game for their position, thus “feeling” whether the ball is on the left or on the right.
"Thinking" is also built into the two if statements. They contain two solutions, which in this case mutually exclude each other, leading to the choice of one of the three actions - to move the racket to the left, to move it to the right or to do nothing if the racket is already positioned correctly.
The “action” consists in the constructions “move the racket to the left” or “move the racket to the right”. Depending on the way the game is implemented, this can take the form of instantly moving the position of the racket or setting the speed and direction of the racket so that it can be properly moved in another game code.

Such approaches are often called “reactive” (reactive), because here there is a simple set of rules (in our case they are “if” operators in the code), which reacts to the state of the world and instantly decides how to act.

Decision trees

This Pong example is actually analogous to the formal concept of AI called the " decision tree ." This is a system in which solutions are lined up in the form of a tree and the algorithm must bypass it in order to achieve a “sheet” containing the final decision on the chosen action. Let's draw a graphical representation of the decision tree for the Pong racket algorithm using the flowchart:

It can be seen that it resembles a tree, only turned!

Each part of the decision tree is usually called a “node”, because AI uses graph theory to describe such structures. Each node can be one of two types:

Solution nodes: a choice of two alternatives based on a test of a condition. Each alternative is represented as its own node;
End Nodes: the action to be performed, which is the final decision made by the tree.

The algorithm starts with the first node, designated by the "root" of the tree, after which it either decides which child node to switch to based on the condition, or performs the action stored in the node, and then stops working.

At first glance, the advantage of the decision tree is not obvious, because it performs absolutely the same work as the “if” operators from the previous section. But there is a very common system in which each solution has exactly 1 condition and 2 possible results, which allows the developer to build AI from data representing solutions in the tree, and avoiding rigidly prescribing it in the code. It is easy to imagine a simple data format for describing a similar tree:

Node number	Solution (or "end")	Act	Act
one	The ball to the left of the racket?	Yes? Check node 2	Not? Check node 3
2	the end	Slide the racket to the left
3	The ball to the right of the racket?	Yes? Go to node 4	Not? Go to node 5
four	the end	Move the racket to the right
five	the end	Nothing to do

From the point of view of code, we need to force the system to read each of these lines, create for each node, attach decision logic based on the second column and attach child nodes based on the third and fourth columns. We still need to manually rigidly prescribe conditions and actions, but now we can imagine a more complex game in which you can add new solutions and actions, as well as customize the whole AI by changing the only text file containing the definition of the tree. We can transfer the file to the game designer, who will be able to customize the behavior without having to recompile the game and change the code - provided that the code already has useful conditions and actions.

Decision trees can be very powerful when they are constructed automatically based on a large number of examples (for example, using the ID3 algorithm ). It makes them an effective and high-performance tool for classifying a situation based on incoming data, but this topic is outside the scope of creating simple action selection systems for agents by designers.

Scripting

Above, we looked at the decision tree system, which uses pre-created conditions and actions. The AI developer can rebuild the tree in any way he needs, but he must rely on the fact that the programmer has already created all the conditions and actions he needs. But what if we give the designer more powerful tools that allow him to create his own conditions, and maybe his own actions?

For example, instead of forcing the coder to write the conditions “Ball to the left of the racket?” And “Ball to the right of the racket?”, He can simply create a system in which the designer writes the conditions for checking these values himself. As a result, decision tree data might look like this:

Node number	Solution (or "end")	Decision	Act
one	ball.position.x <paddle.position.x	Yes? Check node 2	Not? Check node 3
2	the end	Move the racket to the left
3	ball.position.x> paddle.position.x	Yes? Check node 4	Not? Check node 5
four	the end	Move the racket to the right
five	the end	Nothing to do

Same as before, but now the solutions have their own code, similar to the conditional part of the "if" operator. The code will read decision nodes from the second column and instead of searching for a specific condition (for example, “ball to the left of the racket?”), Calculate the conditional expression and return true or false. This can be implemented by embedding a scripting language , like Lua or Angelscript, which allows the developer to take objects from the game (for example, a ball and a racket) and create variables that are accessible from a script (for example, ball.position). It is usually easier to write in a scripting language than in C ++, and it does not require a full compilation step, so it is well suited for making quick changes to the game logic and allows less technically savvy team members to create game functions without coder intervention.

In the example shown above, the scripting language is used only to evaluate the conditional expression, but you can also describe the final actions in the script. For example, these actions like “moving the racket to the right” can become a script design like ball.position.x += 10, that is, the action is also specified in the script without the programmer writing the MovePaddleRight function code.

If we take one more step forward, then we can (and this is often done) reach the logical conclusion and write the entire decision tree in the scripting language, and not as a list of data lines. This will be code that is similar to the conditional constructs shown above, only they are not “hard-coded” - they are in external script files, that is, they can be changed without recompilating the entire program. It is often even possible to change the script file during the execution of the game, which allows developers to quickly test various approaches to the implementation of AI.

Reaction to events

The examples shown above are intended for frame-by-frame execution in simple games like Pong. The idea is that they continuously perform the cycle of “perception-thinking-action” and continue to act on the basis of the last state of the world. But in more complex games, instead of calculating everything, it is often more sensible to react to “events”, that is, to important changes in the environment of the game.

This is not particularly applicable to Pong, so let's choose another example. Imagine a shooter game in which enemies are immobile until they find a player, and then begin to perform actions depending on their class - melee fighters can rush to the player, while snipers remain at a distance and try to aim. In fact, this is a simple reactive system - “if we see a player, then we do something” - but it can be logically divided into an event (“we see the player”) and a reaction (we choose a response and execute it).

This brings us back to the perception-thinking-action cycle. We may have a code fragment, which is a “perception” code, which checks in each frame whether the enemy sees the player. If not, nothing happens. But if he sees, then it creates an event "see the player." The code will have a separate part, which says: “when an event“ see the player ”occurs, then we do" xyz "", and "xyz" is any response that we want to process thinking and acting. For a character-fighter, you can connect to the event "see player" response "run and attack." For the sniper, we will connect to this event the “hide and aim” response function. As in the previous examples, we can create such associations in the data file so that they can be quickly changed without rebuilding the engine. Besides,

Improved decision making

Although simple reactive systems are very powerful, there are many situations where they are not enough. Sometimes we need to make different decisions based on what the agent is doing at the moment, and it is inconvenient to present it as a condition. Sometimes there are simply too many conditions to effectively present them in the form of a decision tree or a script. Sometimes we need to think in advance and assess how the situation will change before deciding on the next move. For such tasks need more complex solutions.

State machines

A finite state machine (KA, finite state machine, FSM) is a way of saying in other words that an object — say, one of our AI agents — is currently in one of several possible states, and that it can transition from one state to another. There are a finite number of such states, hence the name. An example from the real world can be a lot of traffic lights, switching from red to yellow, then to green, and then back again. There are different sequences of lights in different places, but the principle is the same - each state means something (“stay”, “drive”, “stand if possible”, etc.), at any time there is only one state, and The transitions between them are based on simple rules.

This is well applicable to NPC games. A guard may have the following clearly separated states:

Patrol
Attack
Escape

And we can come up with the following rules for transition between states:

If the guard sees the enemy, he attacks
If the guard attacks, but no longer sees the enemy, then returns to patrol
If a guard attacks, but he is seriously injured, he escapes

This scheme is quite simple and we can record it with hard-coded “if” operators and a variable that stores the state of the guard and various checks - the presence of nearby enemies, the level of health of the guard, etc. But imagine that we need to add a few more states:

Waiting (between patrols)
Search (when the previously seen enemy hid)
Escape for help (when the enemy is seen, but he is too strong to fight him alone)

And the choices available in each state are usually limited — for example, a security guard probably won't want to look for an enemy lost sight of if his health is too low.

Sooner or later, a long list of “if <x and y but not z> then <p>” becomes too clumsy, and a formalized approach to realizing states and transitions between them can help here. To do this, we consider all the states and under each state we list all the transitions to other states together with the conditions necessary for them. We also need to specify the initial state so that we know where to start before applying other conditions.

condition	Transition condition	New condition
Expectation	waited for 10 seconds	Patrol
	the enemy is visible and the enemy is too strong	Help search
	see the enemy and health a lot	Attack
	see the enemy and health is not enough	Escape
Patrol	patrol route completed	Expectation
	the enemy is visible and the enemy is too strong	Help search
	see the enemy and health a lot	Attack
	see the enemy and health is not enough	Escape
Attack	the enemy is not visible	Expectation
Attack	little health	Escape
Escape	the enemy is not visible	Expectation
Search	searched for 10 seconds	Expectation
	the enemy is visible and the enemy is too strong	Help search
	see the enemy and health a lot	Attack
	see the enemy and health is not enough	Escape
Help search	friend see	Attack
Initial state: waiting

Such a scheme is called a state transition table. It is a complex (and unattractive) way of representing spacecraft. From this data, you can also draw a diagram and get a complex graphical representation of how the behavior of the NPC may look.

It captures the very essence of decision making for an agent based on the situation in which it is located. Each arrow indicates a transition between states if the condition near the arrow is true.

With each update (or “cycle”) we check the current state of the agent, review the list of transitions and if the transition condition is met, then we move to a new state. The “Waiting” state checks in each frame or cycle whether the 10-second timer expired. If it has expired, it starts the transition to the “Patrol” state. Similarly, the state of “Attack” checks whether the agent’s health is not small, and if so, it makes a transition to the “Escape” state.

This is how state transitions are handled - but what about the behaviors associated with the states themselves? From the point of view of performing the actions themselves for the state, there are usually two types of attachment of actions to the AC:

Actions for the current state are performed periodically, for example, in each frame or “cycle”.
Actions are performed when moving from one state to another.

An example of the first type: the “Patrol” state in each frame or cycle continues to move the agent along the patrol route. The “Attack” state in each frame or cycle attempts to launch an attack or move it to a position from which it is possible. And so on.

An example of the second type: consider the transition “if the enemy is visible and the enemy is too strong → Search for help”. The agent must choose where to go for help and store this information so that the “Search for help” state knows where to go. Similarly, in the “Search for help” state, when help is found, the agent returns to the “Attack” state, but at this moment he wants to inform the friendly character about the threat, so there may be an action “tell a friend about the danger” performed during this transition.

And here we can again consider this system from the point of view of “perception-thinking-action”. Perception is embedded in the data used by the transition logic. Thinking is built into the transitions available for each state. And the action is performed by actions performed periodically in a state or during a transition between states.

This simple system works well, although sometimes constant polling of transition conditions can be a costly process. For example, if each agent needs to perform complex calculations in each frame to determine the visibility of enemies and decide on the transition from patrol to attack, then this may take a lot of CPU time. As we saw earlier, you can perceive important changes in the state of the world as “events” that are processed after they occur. Therefore, instead of explicitly checking the transition condition “can my agent see the player?” In each frame, we can create a separate visibility system that performs these checks a little less often (for example, 5 times per second) and creates the event “player see "when the test is triggered. It is passed to the state machine, which now has the transition condition “Event received“ player see ””, and which responds to it accordingly. The resulting behavior will be similar, with the exception of a barely noticeable (and even increasing realism) delay of the reaction, but productivity will increase due to the transfer of “perception” into a separate part of the program.

Hierarchical finite automata

All this is good, but it becomes very inconvenient to work with large finite automata. If we want to expand the state of “Attack”, replacing it with separate states “Hand-to-hand attack” and “Attack from afar”, then we will have to change incoming transitions from each state, present and future, which need the ability to go to the “Attack” state.

You probably also noticed that in our example there are a lot of duplicate transitions. Most transitions in the “Waiting” state are identical to those in the “Patrol” state, and it would be nice to avoid duplicating this work, especially if we want to add even more similar states. It will be logical to combine the "Waiting" and "Patrol" in any group "Non-Combat States", which has only one common set of transitions to combat states. If we represent this group as a state, we can consider “Waiting” and “Patrol” as “substates” of this state, which will allow us to more effectively describe the entire system. An example of using a separate transition table for the new non-combat substate:

Basic states:

condition	Transition condition	New condition
Non-combat	the enemy is visible and the enemy is too strong	Help search
	see the enemy and health a lot	Attack
	see the enemy and health is not enough	Escape
Attack	the enemy is not visible	Non-combat
Attack	little health	Escape
Escape	the enemy is not visible	Non-combat
Search	searched for 10 seconds	Non-combat
	the enemy is visible and the enemy is too strong	Help search
	see the enemy and health a lot	Attack
	see the enemy and health is not enough	Escape
Help search	friend see	Attack
Initial state: non-combat

Non-combat status:

condition	Transition condition	New condition
Expectation	waited for 10 seconds	Patrol
Patrol	completed patrol route	Expectation
Initial state: waiting

And in the form of a diagram:

In fact, this is the same system, only now there is a non-combat state, replacing Patrol and Waiting, which itself is a state machine with two substates of patrol and standby. If each state can potentially contain a finite state machine of substates (and these substates can also contain their own finite state machine, and so on), then we have a hierarchical finite state machine (Hierarchical Finite State Machine, HFSM). By grouping non-combat behaviors, we cut off a bunch of redundant transitions, and can do the same for any new states that may have common transitions. For example, if in the future we expand the state of "Attack" to the states of "Hand-to-hand attack" and "Attack by projectile", they can be substates, the transition between which is carried out on the basis of the distance to the enemy and the availability of ammunition having common output transitions on the basis of the levels of health and other things. Thus, a minimum of duplicate transitions can present complex behaviors and sub-references.

Behavior trees

Using HFSM, we gained the ability to create quite complex sets of behaviors in a rather intuitive way. However, it is immediately noticeable that decision making in the form of transition rules is closely related to the current state. Many games require just that. A careful use of the state hierarchy allows you to reduce the number of duplicate transitions. But sometimes we need rules that apply regardless of the current state, or apply in almost all states. For example, if an agent’s health has decreased to 25%, he may want to run away, regardless of whether he is in battle, or is waiting, talking, or in any other state. We do not want to remember that we need to add this condition to every condition that we may add to the character in the future. So that when the designer says later,

The ideal in such a situation was a system in which decisions about which state to be in, exist separately from the states themselves, so that we can change just one element, and the transitions are still processed correctly. This is where the behavior trees come in handy.

There are several ways to implement behavior trees, but the essence is the same for most and very similar to the decision tree mentioned above: the algorithm starts from the “root node”, and there are nodes in the tree denoting decisions or actions. However, there are key differences:

Nodes now return one of three values: “successfully” (if the job was done), “unsuccessful” (if it failed), or “done” (if the job is still being done and has not completely ended with success or failure).
Now we have no decision nodes in which we choose from two alternatives, but there are “decorator” nodes having a single child node. If they are "successful", then they perform their only child node. Decorator nodes often contain conditions that determine whether the execution ended in success (which means they need to execute their subtree) or in failure (then nothing needs to be done). They can also return "executed."
Action nodes return the value “running” to indicate what is happening.

A small set of nodes can be combined to create a large number of complex behaviors, and often this scheme is very brief. For example, we can rewrite the hierarchical KA of the guard from the previous example in the form of a behavior tree:

When using this structure, there is no need for an explicit transition from the “Waiting” or “Patrol” states to the “Attack” or any other states - if the tree is managed from top to bottom and from left to right, then the correct decision is made based on the current situation. If the enemy can see and the character has little health, then the tree will complete execution on the “Escape” node, regardless of the previously executed node (“Patrol”, “Waiting”, “Assault”, etc.).

You may notice that we do not yet have a transition to return to the “Waiting” state from “Patrol” - and here we will need unconditional decorators. The standard decorator node is “Repeat” (Repeat) - it has no conditions, it simply intercepts a child node that returns “successfully” and executes the child node again, returning “executed”. The new tree looks like this:

Trees of behavior are quite complex, because there are often many different ways of building a tree, and finding the right combination of decorator and component nodes can be a tricky task. There are also problems with how often the tree needs to be checked (do we want to bypass it every frame or when something happens that can affect the conditions?) And the way the state is stored relative to the nodes (how do we know we waited 10 seconds? How we will find out how many nodes were executed last time in order to complete the sequence correctly?) Therefore, there are many different implementations. For example, in some systems like the Unreal Engine 4 behavior tree, decorator nodes are replaced by string decorators, which check the tree only when the conditions of the decorator change and provide “services” which can be connected to the nodes and provide periodic updates even when the tree is not checked again. Behavior trees are powerful tools, but studying their proper use, especially considering the many different implementations, can be a daunting task.

Utility based systems

Some games require the existence of many different actions, so they need simpler centralized rules of transitions, but they do not require the power to fully implement the behavior tree. Instead of creating an explicit set of choices or a tree of potential actions with implicit backup positions, given by the tree structure, is it better to simply explore all the actions and choose the one that is most applicable right now?

This is exactly what a system based on utility deals with - these are systems in which an agent has many actions at his disposal, and he chooses to do one based on relative utility.every action. The utility here is an arbitrary measure of the importance or desirability for an agent to perform this action. By writing utility functions to calculate the usefulness of an action based on the current state of the agent and its environment, the agent can check the utility values and select the most appropriate state at the moment.

This is also very much like a finite automaton, except that the transitions are determined by the evaluation of each potential state, including the current one. It is worth noting that in general we choose the transition to the most valuable action (or being in it if we already perform this action), but for greater variability this may be a weighted random choice (giving priority to the most valuable action, but allowing others to choose) , choosing a random action from the top five (or any other number), etc.

A standard utility-based system assigns a certain arbitrary range of utility values — say, from 0 (completely undesirable) to 100 (absolutely desirable), and each action may have a set of factors affecting the method of calculating the value. Returning to our example of a guard, you can imagine something like this:

Act	Utility calculation
Help search	If the enemy is visible and the enemy is strong, but there is little health, then we return 100, otherwise we return 0
Escape	If the enemy is visible and there is little health, then we return 90, otherwise we return 0
Attack	If the enemy is visible, return 80
Expectation	If we are in the waiting state and we are already waiting 10 seconds, we return 0, otherwise 50
Patrol	If we are at the end of the patrol route, we return 0, otherwise 50

One of the most important aspects of this scheme is that the transitions between actions are expressed quite implicitly - from any state you can completely legally switch to any other. In addition, action priorities are implied in the returned utility values. If the enemy is visible, and if he is strong, and the character has little health, then non-zero values return Escape and Search for help , but Search for help always has a higher rating. Similarly, non-combat actions never return more than 50, so combat always wins them. With this in mind, actions and their utility calculations are created.

In our example, actions return either a constant utility value, or one of two constant utility values. A more realistic system uses a return value from a continuous range of values. For example, the Getaway action can return higher utility values if the agent’s health is lower, and the Attack action can return lower utility values if the enemy is too strong. This will allow Escape to take priority over Attack.in any situation where the agent feels that he does not have enough health to fight the enemy. This allows you to change the relative priorities of actions based on any number of criteria, which can make this approach more flexible than the behavior tree or the spacecraft.

Each action usually has several conditions that affect the calculation of the utility. In order not to set everything hard in the code, you can write them in a scripting language or as a series of mathematical formulas, put together in an understandable way. Much more information about this is in the lectures and presentations of Dave Mark ( @IADaveMark ).

In some games that try to simulate a character’s daily life, for example, in The Sims, another layer of calculations is added in which the agent has “aspirations” or “motivations” that affect utility values. For example, if a character has a “Hunger” motivation, then it can increase over time, and the calculation of the utility for the “Eat” action will return higher and higher values until the character can perform this action, reducing the level of hunger, and the action “ Eating "is reduced to zero or near-zero utility value.

The idea of choosing actions based on a system of glasses is quite straightforward, so it is obvious that you can use utility-based decision making in other AI decision-making processes, and not completely replace them. The decision tree can query the utility value of its two child nodes and select the node with the highest value. Similarly, a behavior tree can have a composite utility node that counts the utility to select a child node to be executed.

Movement and navigation

In our previous examples there were either a simple racket, which we ordered to move left and right, or a guard character, who was always ordered to patrol or attack. But how exactly do we manage the movement of the agent over a period of time? How do we set the speed, avoid obstacles, plan a route when the end point cannot be reached directly? Now we will consider this task.

Steering

At the simplest level, it is often wise to work with each agent as if it has a speed value that determines the speed and direction of its movement. This speed can be measured in meters per second, in miles per hour, in pixels per second, and so on. If we recall our “perception-thinking-action” cycle, we can imagine that “thinking” can choose speed, after which “action” applies this speed to the agent, moving it around the world. Usually in games there is a physics system that performs this task on its own, studies the value of the speed of each entity and changes its position accordingly. Therefore, it is often possible to assign such work to this system, leaving the AI only the task of choosing the speed of the agent.

If we know where the agent wants to be, then we need to use our speed to move the agent in that direction. In a trivial form, we get the following equation:

desired_travel = destination_position - agent_position

Imagine a 2D world in which the agent is located at coordinates (-2, -2), and the target point is approximately in the northeast, at coordinates (30, 20), that is, to get there you need to move (32, 22). Let's assume that these positions are in meters. If we decide that the agent can move at a speed of 5 m / s, then reduce the scale of the displacement vector to this value and see that we need to set the speed approximately (4.12, 2.83). Moving based on this value, the agent will arrive at the end point in less than 8 seconds, as expected.

Calculations can be performed again at any time. For example, if the agent is halfway to the target, then the desired movement will be half as much, but after scaling to the maximum agent speed of 5 m / s, the speed remains the same. It also works for moving targets (within reason), which allows the agent to make small adjustments while moving.

However, often we need more control. For example, we may need to slowly increase the speed, as if the character first stood motionless, then moved to a step, and later ran. On the other hand, we may need to slow it down when it approaches the goal. Often such tasks are solved with the help of the so-called " steering behaviors.""having their own names like Seek, Flee, Arrival and so on. (Habré has a series of articles about them: https://habr.com/post/358366/ .) Their idea is that you can apply to the speed of the agent acceleration forces based on a comparison of the position of the agent and the current speed of movement to the target, creating different ways of movement to the target.

Each behavior has its own, slightly different purpose. Seek and Arrive are used to move the agent to the destination. Obstacle Avoidance and Separation help the agent make small corrective movements to avoid small obstacles between the agent and his destination. Alignment and Cohesion force agents to move together, imitating herd animals. Any variations of different steering behaviors can be combined together, often in the form of a weighted sum, to create a cumulative value that takes into account all these different factors and creates a single resulting vector. For example, an agent may use Arrival behavior along with Separation and Obstacle Avoidance behaviors to stay away from walls and other agents. This approach works well in open environments that are not too complex and crowded.

However, in more complex environments, simple addition of output values of behaviors does not work very well - sometimes movement near the object is too slow, or the agent gets stuck when Arrival's behavior wants to pass through an obstacle, and Obstacle behavior Avoidance behavior pushes the agent away. . Therefore, it sometimes makes sense to consider steering variations that are more complex than the simple addition of all values. One of the families of such approaches consists in a different implementation — we do not consider each of the behaviors that give us direction, and then combine them to obtain consensus (which in itself may be inadequate). Instead, we consider movement in several different directions — for example, in eight directions of the compass, or 5-6 points ahead of the agent,

However, in difficult environments with dead ends and choice of turns, we will need something more perfect, and we will soon get to this.

Search for ways

Steering behaviors are excellent for simple movement in a fairly open area, for example, a football field or an arena where you can get from A to B in a straight line with minor adjustments to avoid obstacles. But what if the route to the end point is more difficult? Then we will need a “search for paths” (pathfinding) - exploring the world and paving the way for it so that the agent gets to the end point.

The simplest way is to lay a grid on the world, and for each cell next to the agent, look at the neighboring cells into which we can move. If one of them is our end point, then go back from each cell to the previous one, until we get to the beginning, thus obtaining a route. Otherwise, repeat the process with the achievable neighbors of previous neighbors until we find the end point or we do not run out of cells (this will mean that there is no route). Formally, this approach is called Breadth-First Search Algorithm (BFS), because at every step it looks in all directions (i.e., “wide”) before moving the searches out. The search space is like a wavefront that moves until it hits the place we were looking for.

This is a simple example of a search in action. The search area is expanded at each stage until the end point is included in it, after which you can track the path to the beginning.

As a result, we get a list of grid cells that make up the route you need to take. It is usually called “path”, path (hence “search for paths”, pathfinding), but you can also present it as a plan, because it is a list of places you need to go to achieve your goal, that is, the end point.

Now that we know the position of each cell in the world, we can use the above described steering behaviors to move along the route — first from the initial node to node 2, then from node 2 to node 3, and so on. The simplest approach is to move to the center of the next cell, but there is a popular alternative — moving to the middle of the edge between the current cell and the next one. This allows the agent to cut sharp corners to create a more realistic movement.

As you can see, this algorithm can be wasted because it examines as many cells in the “wrong” direction as in the “right” direction. It also does not allow for the costs of movement, in which some cells may be “more expensive” than others. This is where a more complex algorithm called A * comes to the rescue. It works almost like a wide search, only instead of blindly examining the neighbors, then neighbors of neighbors, then neighbors of neighbors of neighbors, and so on, it puts all of these nodes into a list and sorts them so that the next node to be investigated is always most likely leading to the shortest route. Nodes are sorted based on heuristics (that is, in fact, a reasonable assumption),

In this example, we showed that he explores one cell at a time, each time choosing the next cell that has the best (or one of the best) perspectives. The resulting path is similar to the search path in width, but in the process a smaller number of cells is investigated, and this is very important for game performance at difficult levels.

Motion without mesh

In the previous examples we used a grid imposed on the world, and we plotted a route through the world through the cells of this grid. But most games do not overlap with the grid, and therefore overlaying the grid can lead to unrealistic patterns of movement. Also, such an approach may require compromises regarding the size of each cell - if it is too large, it will not be able to adequately describe small corridors and turns, if it is too small, then the search for thousands of cells may be too long. What are the alternatives?

The first thing we need to understand - from the point of view of mathematics, the grid gives us the " graph"connected nodes. A * (and BFS) algorithms work with graphs and the grid is not important. Therefore, we can place nodes in arbitrary positions in the world, and if there is a passable straight line between any two connected nodes, there is a If there is one node, then our algorithm will work as before, and in fact is even better, because there will be fewer nodes. Often this is called a waypoints system, because each node represents an important position in the world that can create part of any number of hypothetical pu s.

Example 1: a node in each cell of the grid. The search begins with the node where the agent is located, and ends with the final cell.

Example 2: a much smaller number of nodes, or waypoints . The search begins with an agent, passes through the required number of waypoints and moves to the end point. Note that moving to the first point of the path to the south-west of the player is an inefficient route, so a certain post-processing of the generated path is usually necessary (for example, to notice that the path can go directly to the point of the path in the north-east).

This is a fairly flexible and powerful system, but it requires careful positioning of waypoints, otherwise agents may not see the nearest waypoint to begin the route. It would be great if we could somehow generate waypoints automatically based on the geometry of the world.

And here comes navmesh. This is short for the navigation mesh. In fact, this is (usually) a two-dimensional mesh of triangles, roughly superimposed on the geometry of the world in those places where the game allows the agent to move. Each of the triangles in the mesh becomes a node of the graph and has up to three adjacent triangles that become adjacent nodes of the graph.

Below is an example from the Unity engine. The engine analyzed the geometry of the world and created navmesh (blue), which is an approximation of geometry. Each polygon is mixed in - this is the area in which the agent can stand, and the agent can move from one polygon to any adjacent polygon. (In this example, polygons are made shorter than the floor on which they lie, in order to take into account the agent radius protruding beyond the nominal position of the agent.)

We can search for a route by mesh, again using A *, and this will give us a close to ideal route around the world, taking into account all the geometry and not requiring an excessive amount of extra nodes (as it would with a grid) and human participation in generating points of the way.

Finding paths is an extensive topic, to which there are many approaches, especially if you need to program low-level parts yourself. One of the best sources of additional information is the site of Amit Patel (translation of an article on Habré: https://habr.com/post/331192/ ).

Planning

Using the example of finding paths, we saw that sometimes it’s not enough just to choose a direction and start moving in it — we need to choose a route and make several moves before reaching the desired end point. We can generalize this idea to a wide range of concepts, in which the goal is not just the next step. To achieve it, you need to take a series of steps, and to know what the first step should be, you may need to look a few steps forward. This approach is called planning . Finding ways can be considered one of the specific applications of planning, but this concept has many more areas of application. If you go back to the perception-thinking-action cycle, this planning is the thinking phase that tries to plan several phases of action for the future.

Let's take a look at the Magic: The Gathering game. You have the first move, you have several cards on your hands, including “Swamp”, which gives 1 point of black mana, and “Forest”, which gives 1 point of green mana, “Exorcist”, which you need to call 1 point of blue mana, and Elven Mystic ”, to call which you need 1 point of green mana. (For simplicity, we omit the remaining three cards.) The rules state that a player can play one land card per turn, can "touch" his land cards to get mana from them, and can cast so many spells (including summoning creatures) how much mana he has. In this situation, the player is likely to play "Forest", touch him to get 1 point of green mana, and then call the "Elven Mystic." But how does the game AI know that this decision needs to be made?

Simple "scheduler"

A naive approach may consist in a simple enumeration of each action in order, until it remains suitable. Looking at the hand, the AI sees that he can play Swamp, which is why he does. Are there any more actions left after this this turn? He cannot call either the “Elven Mystic” or the “Exorcist”, because it requires green or blue mana, and the “Swamp” played only gives black mana. And we can not play "Forest", because already played "Swamp". That is, the AI player will make a move by the rules, but it will not be very optimal. Fortunately, there is a better solution.

Almost the same way that a search for ways finds a list of positions to move around the world in order to get to the desired point, our planner can find a list of actions that translate the game to the desired state. Also, as each position on the path has a set of neighbors, which are potential choices for the next step along the path, each action in the plan has neighbors, or “heirs,” who are candidates for the next plan step. We can search for these actions and inherit actions until we reach the desired state.

Suppose that for our example, the desired result would be "summon a creature, if possible." At the beginning of the turn, we have only two potential actions allowed by the rules of the game:

1. Play Swamp (the result: Swamp leaves the hand and enters the game)
2. Play "Forest" (result: "Forest" leaves the hand and enters the game)

Each action taken can open further actions or close them, also in accordance with the rules of the game. Imagine that we chose to play “Swamp” - it closes the opportunity to play this card as a potential inheritance action (because “Swamp” has already been played), closes the opportunity to play “Forest” (because the rules of the game allow you to play only one land card per turn) and adds the ability to touch "Swamp" to get 1 black mana point - and this is, in fact, the only inherited action. If we take one more step and choose “touch the“ Swamp ”,” we will get 1 point of black mana, with which we can’t do anything, therefore it is meaningless.

1. Play Swamp (the result: Swamp leaves the hand and enters the game)
            1.1 Touch "Swamp" (result: we touched "Swamp", +1 black mana available)
                        No action left - END
2. Play "Forest" (result: "Forest" leaves the hand and enters the game)

This short list of actions did not give us much and led to a “dead end” if we use the analogy with the search for ways. Therefore, we repeat the process for the next step. We choose to play "Forest". This also eliminates the possibility of “playing Les” and “playing Swamp”, and opens up as a potential (and only) next step “touch the Forests”. This gives us 1 point of green mana, which in turn opens up the third step - “summon the Elven Mystic”.

1. Play Swamp (the result: Swamp leaves the hand and enters the game)
            1.1 Touch "Swamp" (result: we touched "Swamp", +1 black mana available)
                        No action left - END
2. Play "Forest" (result: "Forest" leaves the hand and enters the game)
            2.1 Touch Forest (result: we touched Swamp, +1 green mana available)
                        2.1.1 Summon Elven Mystic (result: Elven Mystic in the game, -1 green mana available)
                                    No action left - END

Now we have explored all possible actions and actions that follow from these actions, finding a plan that allows us to summon a creature: “play Forest”, “touch Forest”, “call” Elven mystic ”.

Obviously, this is a very simplified example, and usually you need to choose the best one.a plan, and not just a plan that satisfies some criteria (for example, “summon a creature”). You can usually evaluate potential plans based on the final result or the cumulative benefits of using the plan. For example, you can give yourself 1 point for a laid out map of lands and 3 points for calling a creature. “Playing the Swamp” will be a short plan giving 1 point, and a plan to “play the Forest” → touch the “Forests” → call the “Elven Mystic” ”gives 4 points, 1 for the land and 3 for the creature. This will be the most profitable plan available, so you should select it if we have assigned such points.

Above, we showed how planning works within one Magic: The Gathering move, but it can also be applied to actions in a series of moves (for example, “move the pawn to give the elephant development space” in chess or “run for cover to in the next move he could shoot, being safe ”in XCOM) or to the general strategy of the whole game (for example,“ build pylons to all other protoss buildings ”in Starcraft, or“ drink Fortify Health potion before attacking the enemy ”in Skyrim).

Improved planning

Sometimes there are too many possible actions at every step, and evaluating each option turns out to be an unreasonable action. Let's go back to the example of Magic: The Gathering - imagine that we have several creatures on our hand, many lands have already been played, so we can call any creature, several creatures with their abilities are played, and there are a couple more land maps on the hand - the number of permutations of reuploading land, land use, summoning creatures and using creature abilities can be thousands or even tens of thousands. Fortunately, there are a couple of ways to solve this problem.

The first is called " backwards chaining"(" Reverse traversal "). Instead of checking all the actions and their results, we can start with each of the desired end results and see if we can find a direct path to them. You can compare this with trying to reach a certain leaf in the tree - much more logical start from this sheet and go back, laying a route along the trunk (and this route we can then go in the reverse order) than start from the trunk and try to guess which branch to choose at each step. If you start from the end and go in the opposite direction, then created e plan will be much faster and easier.

For example, if an enemy has 1 health point left, then it may be useful to try to find a plan to “inflict 1 or more points of direct damage to the enemy”. Our system knows that in order to achieve this goal, it needs to cast a direct damage spell, which in turn means that we should have it on our hand and we need enough mana to cast it. This, in turn, means that we need to touch enough land to get this mana, which may require playing an additional land map.

Another way is to search for the first best match. Instead of a long detour of all permutations, we measure how “good” each partial plan is (in the same way as we chose from the plan options above) and calculate each time the best looking one. Often this allows you to create an optimal, or at least a good enough plan without having to consider every possible rearrangement of plans. A * is a type of search for the first best match — it first explores the most promising routes, so it can usually find a way to a goal without having to go too far in other directions.

An interesting and increasingly popular search option for the first best match is a Monte Carlo search.. Instead of guessing which plans are better than others when choosing each subsequent action, this method chooses random follow-up actions at each step until it reaches an end in which no actions are possible anymore - probably because the hypothetical plan led to a victory or loss. - and uses this result to give more or less weight to the previously selected options. When the process is repeated many times, the method can create a good estimate of the best next step, even if the situation changes (for example, if the opponent tries to thwart our plans).

Finally, no discussion of planning in games will be complete without mentioning action based planning.(Goal-Oriented Action Planning, GOAP). This is a widely used and actively discussed technique, but apart from a few specific implementation details, it is essentially a backward bypass planner, which starts with a goal and tries to choose an action leading to this goal, or, more likely, a list of actions leading to to the goal. For example, if the goal was “to kill a player” and the player is in shelter, then the plan could be: “Smoke a player with a grenade” → “Pull out a weapon” → “Attack”.

Usually there are several goals, and each has its own priority. If the goals with the highest priority cannot be achieved, for example, no set of actions can form a plan to kill the player, because the player is not visible, then the system returns to targets with lower priorities, for example, Patrol or Keep on the spot.

Training and adaptation

At the beginning of the article, we mentioned that game AI generally does not use “machine learning”, because it is usually not suitable for controlling real-time intellectual agents of the game world. However, this does not mean that we cannot borrow something from this area where it makes sense. We may need a computer opponent in a shooter to find out the best places to go to in order to get the most kills. Or we may want the opponent to be in a fighting game. for example, in Tekken or Street Fighter, he learned to recognize the use of the same combos by a player in order to start blocking them, forcing the player to use a different tactic. That is, there are cases when a certain proportion of machine learning is useful.

Statistics and probabilities

Before we move on to more complex examples, it’s worth finding out how far we can go by simply taking measurements and using this data to make decisions. For example, let's say we have a game in the real-time strategy genre, and we need to understand whether a player will start a rush within the first few minutes in order to decide whether to build more protection. We can extrapolate the previous behavior of the player to understand what may be the behavior in the future. At first we have no data that can be extrapolated, but each time the AI plays against a live opponent, it can record the time of the first attack. After a few matches, this time can be averaged, and we will get a fairly good approximation of the player’s attack time in the future.

The problem with simple averaging is that it usually converges in the center over time. Therefore, if the player used the rush strategy the first 20 times, and the next 20 times switched to a much slower strategy, then the average value will be somewhere in the middle, which will not give us any useful information. One way to improve data is to use a simple averaging window, which takes into account only the last 20 data points.

A similar approach can be used in assessing the likelihood of certain actions, if we assume that the player’s previous preferences will continue in the future. For example, if a player five times attacked with a fireball, two times with lightning and hand to hand only once, then most likely he would prefer a fireball 5 out of 8 times. Extrapolating from these data, we can see that the probability of using weapons is as follows: Fireball = 62.5%, Lightning = 25% Hand-to-hand = 12.5%. Our AI characters will understand that they better find fire resistant armor!

Another interesting method is to use the naive Bayes Classifier to study large amounts of input data in order to classify the current situation so that the AI agent can react accordingly. Bayesian classifiers are probably best known for using them in email spam filters, where they evaluate the words in the letter, comparing them with those that were most often encountered in spam and normal messages in the past. Based on these calculations, they decide on the likelihood that the last email received is spam. We can do something similar, only with a smaller amount of input data. Recording all observable useful information (for example, enemy units being created,

When using all of these learning techniques, it can be sufficient, and often and preferable, to apply them to the data collected during the playtest before the game is released. This allows the AI to adapt to the different strategies used by the playters, and not to change after the game is released. An AI that adapts to a player after a game is released may become too predictable or even difficult to defeat.

Easy adaptation based on weights.

Let's take another step forward. Instead of just using incoming data to choose between discrete pre-written strategies, you can change the set of values that affect decision making. If we well understand the game world and the rules of the game, we can do the following:

Force the AI to collect data on the state of the world and key events during the game (as in the example above);
Change values or “weights” based on the data in the process of collecting them;
Implement solutions based on processing or calculating these weights.

Imagine a computer agent who can select rooms on a map in a first-person shooter. Each room has a weight that determines the desirability of visiting this room. Initially, all rooms have the same value. When choosing a room, the AI selects it randomly, but with the influence of these scales. Now imagine that when a computer agent is killed, it remembers which room it is in and reduces its weight so that it is less likely to return to it in the future. Similarly, imagine that a computer agent committed a murder. Then he can increase the weight of the room in which he is in order to raise it in the list of preferences. So if one room becomes especially dangerous for an AI player, then it starts to avoid it in the future, and if some other room allows the AI to get a lot of kills,

Markov models

What if we wanted to use the collected data to create forecasts? For example, if we record every room in which we see a player for a certain period of time, then we can reasonably predict which room he can move further. By tracking both the current room in which the player is located and the previous one, and recording these pairs of values, we can calculate how often each of the previous situations leads to the subsequent situation, and use this knowledge for predictions.

Imagine that there are three rooms - red, green and blue, and that during the game session we received the following observations:

The first room in which the player is seen	Total observations	Next room	How many times seen	Percent
Red	ten	Red	2	20%
		Green	7	70%
		Blue	one	ten%
Green	ten	Red	3	thirty%
		Green	five	50%
		Blue	2	20%
Blue	eight	Red	6	75%
		Green	2	25%
		Blue	0	0%

The number of detections in each of the rooms is fairly uniform, so this does not give us an understanding of which of the rooms can be a good place for an ambush. The data may be distorted by the fact that the players are uniformly spun on the map, with equal probability of appearing in any of these three rooms. But data about the next room visit can be useful and help us to predict the player’s movement on the map.

We can immediately notice that the green room is very attractive to players - most of the players from the red room turned into green, and 50% of the players seen in the green room stay there with the next check. We can also see that the blue room is a rather unattractive place. People rarely move from the red or green rooms to the blue and it seems that no one likes to stay there for a long time.

But the data tells us something more specific - they say that when a player is in the blue room, he will most likely follow the red one and not the green one. Despite the fact that the green room is a much more popular place to go than the red one, the tendency is a little opposite if the player is in the blue room. It seems that the next state (i.e. the room where he decides to move further) depends on the previous state (i.e. the room he is in now), so this data allows us to create better predictions about the players than with independent counting of observations.

This idea that we can use knowledge of a previous state to predict a future state is called a Markov model., and similar examples in which we have precisely measured events (for example, “what player is in the room”) are called Markov chains. Since they represent the probability of transition between successive states, they are often represented graphically in the form of a finite automaton, next to each transition of which its probability is indicated. Previously, we used a state machine to represent the state of behavior in which the agent is located, but this concept can be extended to all kinds of states, whether they are associated with the agent or not. In our case, the states indicate occupied by the agent room. It will look like this:

This is a simple approach to denote the relative probability of transition to various states, which gives AI the ability to predict the next state. But we can go further by creating a system that looks two or more steps into the future.

If a player was seen in the green room, we use data that tells us that there is a 50 percent chance that the next observation will still be in the green room. But what is the likelihood that he will remain in it for the third time? This is not only the probability that he will remain in the green room for two observations (50% * 50% = 25%), but also the probability that he will leave and return. Here is a new table with previous values applied to three observations: one current and two hypothetical in the future.

Observation 1	Hypothetical observation 2	Percentage probability	Hypothetical observation 3	Percentage probability	Accumulated probability
Green	Red	thirty%	Red	20%	6%
			Green	70%	21%
			Blue	ten%	3%
	Green	50%	Red	thirty%	15%
			Green	50%	25%
			Blue	20%	ten%
	Blue	20%	Red	75%	15%
			Green	25%	five%
			Blue	0%	0%
				Total:	100%

Here we see that the probability of seeing a player in the green room after 2 observations is 51% - 21% of what comes from the red room, 5% of what we see the player visiting the blue room, and 25% of what he is will stay in the green room.

The table is just a visual hint, the procedure requires only a multiplication of probabilities at each stage. This means that we can look far into the future, but with one significant caveat: we make the assumption that the probability of entering a room depends entirely on which room we are in at the moment. This idea that the future state depends only on the current one is called a Markov property.. Although it allows us to use such powerful tools as Markov chains, usually it is only an approximation. Players can make decisions about visiting rooms based on other factors, such as their level of health and the amount of ammunition, and since we do not record this information as part of the state, our predictions will be less accurate.

N-grams

Let's go back to our example of recognizing combos in a fighting game. This is a similar situation in which we want to predict a future state based on the past (to decide how to block an attack or dodge it), but instead of studying a single state or event, we will consider sequences of events that create a combo movement.

One way to do this is to save each player input (for example, a kick , a kick or block ) to the buffer and record the entire buffer as an event. Imagine that a player constantly presses a kick , a kick , a kick with his hand to use the death SuperCula attack.", and the AI system saves all player input to the buffer and remembers the last 3 entries used at each step.

Input	Already available input sequence	New input memory
Kick	Kick	not
Punch	Kick	not
Kick	Kicking, punching, kicking	Kicking, punching, kicking
Kick	Kick, kick, kick, kick	Blow, kick, kick
Punch	Kicking, punching, kicking, kicking, punching	Kicking, kicking, punching
Block	Kick, kick, kick, kick, block	Kick foot, punch, block
Kick	Kick foot, punch hand, kick, kick, punch, block, kick	Punch, block, kick
Kick	Kicking, punching, kicking, kicking, punching, block, kicking, kicking	Block, kick, kick
Punch	Kicking, punching, kicking, kicking, punching, blocking, kicking, kicking, punching	Kicking, kicking, punching

(In the bold lines, the player performs the Superkulak of Death attack.)

You can look at all the times when the player chose a kick in the past , followed by another kick , and notice that the next entry is always a kick . This allows the AI agent to create a prediction that if a player has just chosen a kick, followed by a kick, then he will most likely choose a kick , thus launching the death Superkulak . This allows the AI to choose an action that counteracts this blow, such as a block or dodge.

Such sequences of events are called N-grams.where N is the number of stored items. In the previous example, it was a 3-gram, also called a trigram, that is, the first 2 elements are used to predict the third. In the 5 grams, the fifth element is predicted for the first 4 elements, and so on.

Developers should carefully choose the size of the N-gram (sometimes called the order). The smaller the number, the less memory is required, because the number of permissible permutations is smaller, but the less history is saved, which means that the context is lost. For example, a 2-gram (also called a “bigram”) will contain entries for kicking , kicking and entries for kicking , punching , but can’t save kicking , kicking, punch , so will not be able to track this combo.

On the other hand, the larger the order, the more memory is required, and the system will most likely be harder to train, because we will have much more possible permutations, which means we may never meet the same twice. For example, if there are three possible inputs ( kick , punch and block ) and we use a 10-gram, then there will be almost 60 thousand different permutations.

The bigram model is essentially a trivial Markov chain - each pair of “future state / current state” is a bigram and we can predict the second state based on the first one. Trigrams and large N-grams can also be considered as Markov chains, where all elements of the N-gram, except the last, form the first state, and the last element is the second state. In our example with a fighting game, the probability of transition from a state of kicking and kicking to a state of kicking, then a punch is represented.. Taking several input history elements as a single element, we essentially convert the input sequence into one state fragment, which gives us the Markov property, allowing us to use Markov chains to predict the next input, that is, guess what combo movement will follow.

Knowledge representation

We discussed several ways of making decisions, creating plans and forecasting, and all of them are based on observations of the state of the world. But how do we effectively monitor the entire game world? Above, we saw that the way of representing the geometry of the world strongly influences movement through it, so it is easy to imagine that this is also true for other aspects of the game AI. How do we collect and organize all the necessary information in the optimal way (so that it is frequently updated and available to many agents) and practical (so that the information can be easily used in the decision-making process)? How to turn simple data into information or knowledge ? For different games, the solutions may be different, but there are several most popular approaches.

Tags / tags

Sometimes we already have a huge amount of useful data, and the only thing we need is a good way to categorize it and search for it. For example, in the game world there may be many objects, and some of them are good shelter from enemy bullets. Or, for example, we have a bunch of recorded audio dialogs that are applicable in specific situations, and we need a way to quickly understand them. The obvious step is to add a small bit of additional information that you can use to search. Such fragments are called tags or tags.

Let's go back to the shelter example; In the game world there can be a lot of objects - boxes, barrels, tufts of grass, wire fences. Some of them are suitable for shelter, for example, crates and barrels, others not. Therefore, when our agent performs the “Move to Shelter” action, he must search for objects nearby and identify suitable candidates. He cannot simply perform a search by name — perhaps the game has “Crate_01”, “Crate_02”, right up to “Crate_27”, and we don’t want to search for all these names in the code. We do not want to add another name to the code each time the artist creates a new variation of the box or barrel. Instead, you can search for any names that contain the word “Crate”, but one day the artist can add “Broken_Crate” with a huge hole, unfit for shelter.

Therefore, instead of this, we will create a “COVER” tag and ask artists with designers to attach this tag to all objects that can be used for shelter. If they add a tag to all the barrels and (whole) boxes, the AI procedure will only find objects with this tag, and she will know that the objects are suitable for this purpose. The tag will work even if objects are renamed later, and it can be added to objects in the future without making unnecessary changes to the code.

In the code, tags are usually represented as strings, but if all used tags are known, then you can convert strings to unique numbers to save space and speed up the search. In some engines, tags are built-in functionality, for example, in Unity and in Unreal Engine 4therefore, it’s enough to decide on the choice of tags and use them for their intended purpose.

Smart objects

Tags are a way to add additional information to the agent’s environment to help him understand the options available, so that requests like “Find me all the nearest places you can hide behind” or “Find me all enemies nearby who can cast spells” were performed effectively and with minimal effort worked for new game resources. But sometimes tags do not contain enough information for their full use.

Imagine a medieval city simulator in which adventure seekers wander wherever they like, train, fight and rest if necessary. We can arrange training grounds in different parts of the city and assign them the TRAINING tag so that the characters can easily find a place to train. But let's imagine that one of them is a shooting range for archers, and the other is a school of wizards. In each of these cases, we need to show our animation, because under the general name of "training" they represent different activities, and not every adventure seeker is interested in both types of training. You can go even further and create the ARCHERY-TRAINING and MAGIC-TRAINING tags, separate the training procedures from each other and embed different animations into each one. This will help solve the problem. But imagine that the designers will later declare, "Let us have a school of Robin Hood, in which you can learn archery and sword fighting"! And then, when we add a sword fight, they are asked to create a "Academy of spells and Gandalf sword fights." As a result, we will have to store several tags for each location and look for different animations based on which aspect of the training the character needs, etc.

Another way is to store information directly in the object along with the influence it has on the player, so that the AI-actor can simply list the possible options and choose from them according to the needs of the agent. After that, he can move to the appropriate place, perform the appropriate animation (or any other required actions), as indicated in the object, and receive the appropriate reward.

	Running animation	Result for user
Shooting range	Shoot-arrow	+10 Archery Skill
School of Magic	Sword duel	+10 skill Swords
Robin Hood School	Shoot-arrow	+15 Archery Skill
Robin Hood School	Sword duel	+8 to skill Swords
Gandalf Academy	Sword duel	+5 to skill Swords
Gandalf Academy	Cast spell	+10 Magic skill

The archer character will have 6 options next to these 4 locations, 4 of which are not applicable to him if he does not use either a sword or magic. Comparing with the result in this case, the improvement of skill, and not the name or tag, we can easily extend the possibilities of the world with new behaviors. To relax and satisfy hunger you can add hotels. You can let characters go to the library and read about spells and advanced archery techniques.

name of the property	Running animation	Final result
Hotel	Buy	-10 to hunger
Hotel	Sleep	-50 to fatigue
Library	Read-Book	+10 Spellcasting skill
Library	Read-Book	+5 Archery Skill

If we already have the “practice archery” behavior, even if we mark the library as a place for ARCHERY-TRAINING, then we will most likely need a special case to process the read-book animation instead of the usual sword fighting animation. This system gives us more flexibility by moving these associations to data and storing data in the world.

The existence of objects or locations - libraries, hotels or schools - tells us about the services they offer, about the character who can get them, allows you to use a small number of animations. The ability to make simple decisions about the results allows you to create a variety of interesting behaviors. Instead of passively waiting for a request, these objects can provide a lot of information about how and why to use them.

Reaction curves

Often there is a situation when part of the state of the world can be measured as a continuous value. Examples:

"Percentage of health" usually has values from 0 (dead) to 100 (absolutely healthy)
"Distance to the nearest enemy" varies from 0 to some arbitrary positive value.

In addition, there may be some aspect of the AI system in the game that requires the input of continuous values in some other interval. For example, to make a decision on escape, a utility assessment system may require both the distance to the nearest enemy and the character’s current health.

However, the system cannot simply add two values of the state of the world to get a certain level of “security”, because these two units of measurement are incomparable - the systems will assume that an almost dead character 200 meters from the enemy is in the same security as an absolutely healthy character 100 meters from the enemy. In addition, while the percentage of health in a broad sense is linear, the distance is not - the difference in distance from the enemy is 200 and 190 meters less significant than the difference between 10 meters and zero.

Ideally, we need a solution that would take two indicators and convert them to similar intervals so that they can be compared directly. And we need designers to control the way in which these transformations are calculated in order to control the relative importance of each value. This is what reaction curves are used for (Response Curves).

The easiest way to explain the reaction curve is as a graph with input along the X axis, arbitrary values, such as “distance to the nearest enemy” and output along the Y axis (usually a normalized value in the range from 0.0 to 1.0). The line or curve in the graph determines the input binding to the normalized output, and the designers set up these lines to get the desired behavior.

To calculate the level of "safety", you can maintain the linearity of the percent health values - for example, 10% more health - this is usually good when the character is seriously injured and when he is injured easily. Therefore, we tie these values to the interval from 0 to 1 in a straight line:

The distance to the nearest enemy is slightly different, so we are absolutely not worried about enemies beyond a certain distance (say 50 meters), and we are much more interested in the differences at close range than at far.

Here we see that the output of "security" for enemies at 40 and 50 meters is almost the same: 0.96 and 1.0.

However, there is a much greater difference between an enemy at 15 meters (about 0.5) and an enemy at 5 meters (about 0.2). Such a schedule better reflects the importance of getting the enemy closer.

By normalizing both of these values in the range from 0 to 1, we can calculate the total safety value as an average of these two input values. A character with 20% health and an enemy 50 meters away will have a safety record of 0.6. A character with 75% health and an enemy just 5 meters away will have a safety record of 0.47. A seriously wounded character with 10% health and an enemy at 5 meters will have a safety record of just 0.145.

Here you should consider the following:

Usually, the weighted average is used to combine the output values of the reaction curves to the final value — this makes it easy to reuse the same curves to calculate other values by applying different weights in each case, reflecting the importance of the difference.
When an input value is outside a specified interval — for example, an enemy is farther than 50 meters — the input value is usually limited to a maximum of an interval, so that calculations are made as if it were in an interval.
The implementation of the reaction curve often takes the form of a mathematical equation, usually applying (sometimes limited to an interval) input to a linear equation or to a simple polynomial. But it can be enough for any system that allows the designer to create and calculate a curve - for example, the Unity AnimationCurve object allows you to substitute arbitrary values, choose the smoothness of the line between the values and calculate any point on the line.

Blackboards

Often we find ourselves in a situation where the AI for the agent must begin to track the knowledge and information obtained during the game so that it can be used in further decision making. For example, an agent may need to remember which last character he attacked in order to focus on the character’s attacks for a short time. Or he should remember how much time has passed after he heard the noise in order to stop looking for his reasons after a certain period of time and return to his previous activities. Very often, the data recording system is strongly separated from the data reading system, so it should be easily accessible from the agent, rather than built directly into various AI systems. Reading may occur some time after the recording, so the data needs to be stored somewhere,

In a rigidly defined code by the AI system, the solution may be to add the necessary variables in the process of occurrence of necessity. These variables refer to instances of a character or agent, either by embedding it directly, or by creating a separate structure / class to store such information. AI procedures can be adapted to read and write this data. In a simple system, this will work fine, but as you add more and more information, it becomes cumbersome, and usually requires rebuilding the game each time.

A more advanced approach is to turn the data warehouse into a structure that allows systems to read and write arbitrary data. This solution allows you to add new variables without the need to change the data structure, thus providing the possibility of increasing the number of changes that can be made from data files and scripts without having to reassemble. If each agent simply stores a list of key-value pairs, each of which is a separate piece of knowledge, then various AI systems can work together by adding and reading this information if necessary.

In the development of AI, such approaches are called “blackboards” (“blackboards”), because each participant — in our case, these are AI procedures (for example, perception, finding a way, and making decisions) —can write to the “board” from which data to perform its task can any other participants. You can imagine this as a team of specialists who gathered around the board and wrote down something useful on it, which you need to share with the group. In doing so, they can read the previous records of their colleagues until they reach a joint decision or plan. A hard-coded list of common variables is sometimes called “static blackboard” (because the elements that store information are constant during program execution), and an arbitrary list of key-value pairs is often called “dynamic blackboard”.

In traditional AI, the emphasis is usually placed on the cooperation of various systems for joint decision making, but in a game AI there is a relatively small number of systems. However, a certain degree of cooperation may still be present. Imagine in an action-RPG the following:

The "perception" system regularly scans the area and writes the following records on the blackboard:
- "The closest enemy": "Goblin 412"
- "Distance to the nearest enemy": 35.0
- “Nearest Friend”: “Warrior 43”
- "Distance to the closest friend": 55.4
- “Last seen noise time”: 12:45 pm
Systems like the combat system can write to the blackboard the time of key events, for example:
- “Last hit time”: 12:34 pm

Many of these data may look redundant - in the end, you can always get the distance to the nearest enemy, just knowing who that enemy is and by querying its position. But when repeated several times per frame in order to decide whether something threatens the agent or not, it becomes a potentially slow operation, especially if it needs to perform a spatial query to determine the nearest enemy. And the time stamps of the “last seen noise” or “the last received damage” will still not be able to get instantaneous - you need to record the time of the occurrence of these events, and blackboard is a convenient place for this.

Unreal Engine 4 uses a dynamic blackboard system to store the data transmitted by behavior trees. Thanks to this common object, these designers can easily write new values to blackboard based on their blueprints (visual scripts), and the behavior tree can later read these values to select a behavior, and this does not require recompilation of the engine.

Influence Maps

The standard task in AI is to decide where the agent should go. In the shooter, we can choose the action “Move to the shelter”, but how to decide where the shelter is located in the conditions of moving the enemies? Similarly, with the action "Escape" - where is the safest thing to run? Or, in the RTS, we may need troops to attack a weak spot in the enemy's defenses - how can we determine where this weak spot is?

All these questions can be considered geographic problems, because we ask a question about the geometry and the form of the environment and the position of entities in it. In our game, all these data most likely already exist, but giving them meaning is not an easy task. For example, if we want to find a weak spot in the defense of the enemy, then simply choosing the position of the weakest building or the fortifications is not good enough if they have two powerful weapon systems located on the flanks. We need a way to take into account the local area and giving the best analysis of the situation.

This is the purpose of the influence map data structure. She describes the "influence" that an entity can have on the area around it. Combining the influence of several entities, we create a more realistic look at the entire landscape. From the point of view of implementation, we approximate the game world by superimposing a 2D grid on it, and after determining which grid cell the entity is in, we apply to this and surrounding cells an impact assessment denoting that aspect of the gameplay that we want to model. For a complete picture, we accumulate these values in the same grid. After which we can perform various queries to the grid in order to understand the world and decide on positioning and target points.

Take, for example, "the weakest point in the defense of the enemy." We have a protective wall, to the attack of which we want to send infantrymen, but behind it stand 3 catapults - 2 close to each other on the left, 1 on the right. How do we choose a good position to attack?

To begin with, we can assign all the cells of the grid within the attack of the catapult +1 defense points. Putting these points on the influence map for one catapult looks like this:

The blue rectangle bounds all cells in which you can launch an attack on a wall. Red squares signify +1 catapult influence. In our case, this means the area of their attack and the threat to attacking units.

Now we will add the effect of the second catapult:

We have a dark area in which the influence of two catapults develops, which gives these cells +2 protection. A +2 cell inside the blue zone can be a particularly dangerous place to attack a wall! Add the effect of the last catapult:

[Badges: CC-BY: https://game-icons.net/heavenly-dog/originals/defensive-wall.html ]

Now we have a full indication of the area covered by the catapults. In the potential attack zone there is one cell that has +2 catapult influences, 11 cells with +1 influence, and 2 cells with 0 influence from the catapults - these are the main candidates for the attack position, in which we can attack the wall without fear of catapults.

The advantage of influence cards is that they transform a continuous space with an almost infinite number of possible positions into a discrete set of approximate positions, with respect to which we can make decisions very quickly.

However, we gained this advantage only by choosing a small number of potential positions for an attack. Why should we use an influence map here instead of manually checking the distance from each catapult to each of these positions?

First, the calculation of the influence map can be very inexpensive. After the influence points are mapped, you don’t need to change it until the entities start moving. This means that we do not need to constantly perform distance calculations or iteratively poll every possible unit - we “bake” this information to the card and can send requests to it any number of times.

Secondly, we can overlay each other and combine different influence maps to perform more complex queries. For example, to choose a safe place to escape, we can take the influence map of our enemies and subtract the map of our friends — the grid cells with the largest negative value will be considered safe.

The redder, the more dangerous, and the greener, the safer. Areas in which influence overlaps each other can be fully or partially neutralized to reflect conflicting areas of influence.

Finally, influence maps are easy to visualize when drawing in the world. They can be a valuable hint to designers who need to customize their AI based on their visible properties, and can be watched in real time to understand why AI chooses their decisions.

Conclusion

I hope this article has given you a general picture of the most popular tools and approaches used in game AI, as well as situations in which they can be applied. The article does not consider many other techniques (they are used less frequently, but can potentially be just as effective), including the following:

algorithms for optimization problems, including the search for a climb to the top, gradient descent and genetic algorithms.
competitive search / scheduling algorithms such as minimax and alpha-beta clipping
classification techniques, for example, perceptrons, neural networks, and support vector machine
agent perception and memory processing systems
architectural approaches to AI, such as hybrid systems, predicative architectures (Brooks architectures) and other ways of decomposing AI systems into layers
animation tools like motion planning and motion matching
performance tasks such as level of detail, time-cut algorithms (anytime algorithms) and time slicing (timeslicing)

To read more about these topics, as well as about the topics discussed in the article, you can explore the following sources.

On GameDev.net there are articles and tutorials on artificial intelligence , as well as a forum on artificial intelligence .
On AiGameDev.com there are many presentations and articles on a wide range of topics of artificial intelligence in the context of game development.
GDC Vault has reports from the GDC AI Summit, many of which are available for free: https://www.gdcvault.com/
In addition, the AI Game Programmers Guild has a bunch of links to old articles and presentations from this summit: http://gameai.com/
AI researcher and game developer Tommy Thompson has a YouTube channel dedicated to explaining and researching AI in commercial games: https://www.youtube.com/user/tthompso

Many of the highest quality materials can be found in books, including the following:

The Game AI Pro series is a collection of short articles explaining the implementation of specific functions or the solution of specific problems. On http://www.gameaipro.com/ free fragments from previous books are available.
The AI Game Programming Wisdom series was the predecessor of the Game AI Pro series. It contains slightly outdated technology, but almost all of them are relevant today. These books are rare, so look for them in digital copies or on sales!
Artificial Intelligence: A Modern Approach is one of the standard texts for those who want to understand the general subject of artificial intelligence. This book is not specifically related to games, and can sometimes be complicated, but it provides an unrivaled overview of the area and teaches the basics on which many AI games are built.

In addition, there are some good books about game AI in general, written by professionals in the industry. It is difficult to give preference to any one - read the reviews and select the one that suits you.

Tags: