Why can a machine play Mario superhumanly, but not a Pokemon?

Original author: Shayaan Jagtap
  • Transfer
On the threshold of our Game Overnight old school video game tournament, we decided to talk about bots in computer games.

Probably you have heard that the capabilities of modern bots for computer games exceed the human ones. Such bots can be hard-coded, always reacting equally to the same input data sets. Another approach to their development is that they are allowed to learn and evolve . They behave differently in the same situations in trying to find optimal solutions to the problems they face.

Here are some famous examples of such bots:

  • AlphaZero is a chess bot that, after 24 hours of training, has become the greatest player on Earth.
  • AlphaGo is a program that beat Lee Sedol and Ke Jie in Go.
  • MarI / O is a Super Mario bot that learns on its own, trying to complete game levels as quickly as possible.

Chess, Go, Super Mario are not easy games, bots are reasonably selected combinations of algorithms, you need a lot of time to learn them.

This material is devoted to the analysis of the MarI / O bot and the story of why the approach used to create this bot does not help to write a program that can play well in Pokemon.

What is the difference between Mario and Pokemon?

There are three key differences between the games Mario and Pokemon, which determine the possible success of bots:

  1. The number of goals.
  2. Branch ratio
  3. Contradiction between global and local optimization

Compare the game with these factors.

Number of targets

Machines are trained by optimizing a certain objective function. This can be the maximization of the reward function or the fitness function (when training with reinforcement and using genetic algorithms ), it can be minimizing the loss function (when training with a teacher ). In any case, if we talk about the application to the game, we are talking about a set of the maximum possible number of points.

The Mario game has one goal: reach the end of the level. Simply put, the further we managed to advance in the game, the better. This indicator expresses a single objective function, and the capabilities of the model can be assessed, simply and clearly, by a single indicator.

The purpose of the game Mario

But in Pokemon many goals. Let's try to figure them out. Maybe the goal of the game is to defeat the elite four? Maybe this is the capture of all Pokemon? Or maybe - you need to train the strongest team? It is possible that the goal of the game is a combination of all previous goals or even something completely different. It is likely that in reality, if you ask this particular player, his goal will be presented as a complex combination of the many achievements available in the game.

Objectives of the game Pokemon

Analyzing the game, you need not only to determine its ultimate goal, but also to decide exactly how progress is made in the game, how certain actions influence the objective function, worsening it or improving it depending on the huge variety of actions available to the player in a single game. moment of time.

Strictly speaking, the choice of options for action in a certain situation leads us to the second indicator of the comparison of games.

Branch ratio

The branch ratio is, in simple terms, an indicator indicating the number of options available at each step of the gameplay. In chess, the average branching ratio is 35. Go is 250. If a bot tries to “look into the future”, having considered all the moves that he can make right now, then all the moves he can make after the current move , and so on, each such level means a serious increase in the complexity of the task. Namely, the number of variants with this approach grows exponentially, being expressed by the formula of the form (branching ratio) ^ (number of levels).

In Mario, a character can move left or right, can jump, and can simply do nothing. The number of options that need to be evaluated by the bot is small. The smaller the branching ratio, the farther into the future a bot can peek at, with the expense of acceptable computing resources.

Options for action in Mario

In Pokemon open game world. This means that at any given time the player has many options. In this case, a simple listing of the directions of possible movement of the character in this case is not suitable for calculating the branching coefficient. Instead, the role is played by certain actions that have some meaning in the game world. Will the next action be a fight, a conversation with a game character, a transition to another area of ​​the map? At the same time, the number of options, as you move through the game, grows.

Simplified representation of character options in Pokemon

In order to create a bot that could figure out which sequence of decisions to take in this situation, it is necessary for this bot to take into account its short-term and long-term goals, which leads us to the next measurement of the comparison of Mario and Pokemon games .

Contradiction between global and local optimization

Local and global optimization can be considered in both spatial and temporal sense. Short-term goals and the surrounding space of the character of a game of a small area are related to local optimization. Long-term goals and relatively large pieces of gaming space (something like a “city” or the entire game world) are related to global optimization.

If, in Pokemon, to break each turn into its component parts, this will help to present the problem that the bot has to solve, consisting of very small fragments. Local optimization, allowing, say, to get from point A to point B, will not cause difficulties. A much more difficult problem is choosing the point B, the direction of movement. Greedy they will not help us here, since locally optimal solutions do not necessarily lead to globally optimal results.

The problem of choosing the next step

Maps in Mario are small and different linearity. Pokemon cards are large, complex and non-linear. The player, as he moves through the game and as he pursues more and more important goals, constantly faces new challenges. And the task of organizing the connection of local optimizations with global goals is not easy. At least, the existing models are not yet ready to solve it.


From the point of view of bots, Pokemon is not one game. Bots are distinguished by a narrow specialization, and the bot that helps the player move around the map will be useless if the player encounters a game character with whom you want to fight. From the point of view of bots, moving around the map and the battle are completely different tasks.

Bots are highly specialized systems.

During a battle, at every step you have to choose from dozens of options. You need to decide what action to take, what kind of pokemon to take, you need to understand when to use different objects. All this, in itself, is a complex optimization problem. Here is the material where you study the task of developing a combat simulator in Pokemon. The article is a well-developed, rather complicated, but the problem of objects, the most important factor influencing the outcome of the battle, is not considered there either.

As a result, we can say that we need to rejoice at the fact that we can create bots that play our games better than us. These games are complex from a mathematical point of view, but their goals are easily determined. With the development of artificial intelligence technology, humanity will be able to create machines that can solve the increasingly important problems of the real world. They will do this by studying these problems, which are complex optimization problems. But for now, I can assure you, there are tasks that we solve better than machines, including games that many of us played in childhood. At least that's how things are now.

Dear readers! We invite you to take part in the first in Russia tournament on the old school video game Game Overnight. The tournament has a qualifying part and the real battle of the best of the best, which will take place on November 30 at the Museum of Soviet Gaming Machines. We are waiting for the tournament from 20 to 3 pm, foamy drinks Smart Admin, Dj Cucumber (Sergey Mezentsev), and RUVDS admin DJ Unpushible will help him, and we will also try new Sub Zero snow burgers from our admins. So, as they say, welcome!

Also popular now: