kurokikaze July 11, 2011 at 16:32

RTS Synchronous Engines and Desync History

Transfer

Have you ever played a game like StarCraft or Supreme Commander and received an error message such as “Out of sync detected”, followed by the closure of the game? Want to know why this is happening? This is a legacy of the game engine architecture often used by real-time strategies. ¹

My experience in this area comes from working with the Supreme Commander engine at Gas Powered Games. During beta testing, Starcraft and Warcraft 3 also had problems with synchronization, so we can say that in general they work the same way. For simplicity, I will talk about the Supreme Commander engine. Finding similarities with other games will leave as an exercise for the reader :)

Requirements

First, what are the requirements for our game? To make it clear to you, here is a video for the first part of Supreme Commander (2006).

The game must support 8 players in a network game over the Internet, with hundreds of units in each army. These are several thousand units in one game. Yoshkin cat. The typical client-server approach for shooters is clearly not suitable here. With so many units, it will require much more bandwidth than most players can.

So how can one approach the task? ..

Synchronous Engine Architecture

... with fully synchronous architecture with a single step! In a synchronous engine,
each client executes the same code at the same speed. Think it over a bit. In a Supreme Commander 8-player game, each player stores the same state of the game and executes the same code. Instead of transmitting information about the status of units (position, health, etc.) over the network, only teams entered by players can be transmitted ² . If all players have the same game state and the same input is processed, the resulting state should also be the same.

By the same principle, triggered repetitions of games operate, including in shooters. Have you ever wondered why the size of the replay files is so small? This is because such a file should only store user input. Then we just start the game, feed it input from the replay file and get the same result as with the original game. That is why the repeat files often stop working ³ when updating the game and that is why they often cannot be rewound ⁴ . By the way, for the same reason, many strategies do not support the connection of new players during the game - to connect a new player, the full state of the game must be transferred to him. For a game with three thousand units, this will take too much time.

Levels

Watch the video at the beginning, if you haven’t. What do you think, with how many frames per second is the game going? The correct answer is 10 fps. “Wait what? She looks much smoother! ”, You say! Yes - and no at the same time. Generally speaking, a game uses two frequencies simultaneously.

The SupCom engine uses two levels - simulation and user. The simulation level runs at a fixed frequency of 10 frames per second. It can be considered “a real game”. All units, all AI and all physics are updated in the SimTick function, which runs 10 times per second. Each SimTick should work in less than 100 ms, otherwise the game will go in slow motion. In a network game, if some player does not have time to completely process SimTick in 100 ms, all other players are forced to stop and wait for the lagging one.

The user level runs at full frame rate. This level can be considered exclusively graphic. The user interface, rendering, animation, and even unit position can be calculated at 60 fps. Each UserTick is updated during delta, which is used to interpolate the game state to an intermediate value (for example, intermediate unit positions). That is why the game can look and play smoothly, although the main engine core runs at a fairly low frequency.

Determinism

“Wait a minute!” Exclaims the smart reader. “If each player independently updates the state of the game, does this mean that the simulation of the game is completely deterministic?” And there is. Is it difficult to achieve this? Yes. Especially in today's multi-threaded world.

A lot of troubles to the developers of the engine deliver floating point numbers. Instead of describing this topic in detail, I will give a link to a fantastic post by Glenn Fielder - Floating Point Determinism .

In the comments to him, Elijah discusses Supreme Commander. Forcing the processor to strictly follow the IEEE754 standard will solve most problems. But such a solution means a decrease in performance and the game cannot perform calculations with an uncertain result (however, this should not be done anyway).

Internal delays

Synchronous multiplayer games have certain disadvantages. In addition to the complexity of creating a huge fully deterministic simulator, there is a delay in input processing. I already wrote how each user in a multiplayer game updates the same state of the game using the same input. This means that any new input will be processed only when all clients agree to what step of the simulation to process it!

For example, three players - A, B and C - launch SimTick [1]. At this time, player A gives the unit a command to attack. UI immediately shows the response, because UserTick runs 60 times per second. In a single-player game, this command will be processed in SimTick [2] (delay 0-100 ms). However, all three players must process the team in the same SimTick run to get the same result. Instead of trying to process the command in SimTick [2], player A sends network packets to players B and C with data to execute in SimTick [4] (delay 200-300 ms). This gives all players time to get a team. The game may fail if input information is not received or is not confirmed on time. I don’t know which mechanism was used for this in SupCom, but I will update this post if I find out. The specific number of runs of SimTick,⁵ .

The delay from the user's click to the reaction of the unit will always be at least 0-100 ms (next SimTick). This can be masked in several ways. The interface usually responds immediately - something flashes, the corresponding sound is heard: “Life for Ayur!” or “Zug Zug”.

In a single-player game, this is normal, but in a multi-player battle, the delay starts to become noticeable, reaching several hundred milliseconds. I always wanted to experiment with instant response animations in UserTick. For example, if you give a command to move, the user level begins to slowly move the unit and “mixes” the movement in the direction of the point indicated by the simulation when the command is actually executed. This can be very useful in more “jerky” games like DOTA or Demigod. There are really certain extreme cases that will have to be handled especially, so I really didn’t take up the implementation. If someone did this, unsubscribe in the comments. :)

Desync - Bugs from Hell

One of the most difficult bugs in the Universe is desync bugs. These are pretty evil bastards. The main assumption of the engine is that all players are fully synchronized. What if this is not so? What happens if the simulations of different players diverge? Chaos. Anger. Suffering ⁶ .

In SupCom, the entire state of the game is hashed once per second. If for some clients the hashes do not match, the concert is over. Game over The end. A window pops up with the message “Sync Error” and you have to exit the game. Something in the results of SimTick did not match and now the state of the game is different. The paths of simulations diverged and further it will only get worse. The recovery mechanism is not provided here.

Out of sync is usually the result of a programmer error. Desync can be played in 5% of games lasting more than 60 minutes. Finding and fixing such an error usually involves a binary search in the hashes of the memory status printed to the console during the game. With desync with a low chance of playback, this leads to a huge number of messages in the console, while half a dozen machines calculate the simulation as fast as possible, waiting for it to break. If this is not enough for you, I’ll add that one of the most common reasons for out of sync is an undeclared variable.

History of the Demigod

Most of my work on the SupCom engine was done while working on Demigod, which used a modified version of the engine.

At the very end of development for a long time there was a rarely recurring desync bug that was assigned to me. In Demigod, a crowd of small cannon fodder ran around the map. And in very rare cases, the positions of individual lemmings on different machines differed by several centimeters. It sounds harmless, but in essence it was that flapping of the wings of a butterfly, from which a hurricane of problems begins.

I remember exactly that I was not sure if I could fix this bug, and our lead programmer said, “I know you can fix it. I believe in you." No pressure, right? Every morning we had a ten-minute meeting and every day my report was short and simple: “the hunt for desync”. After almost a week's descent into the depths of madness, I found the cause of the error. If you watched the trailer, you saw in some heroes the ability to throw opponents into the air. When a huge walking castle lowers its hammer, units fly around. The bug was in one of the pointers in the path search system, which pointed to nowhere, because of which, after landing, the unit simply disappeared.

But this was not enough to play out of sync. For starters, a unit had to be killed by one of several special skills. This removed him from the pathfinder and left a dangling pointer. The memory manager moved the memory of the remote component to the linked list of free areas without changing its contents. Then, before the unit landed, a new allocation of memory had to occur. This selection was supposed to affect the block that was recently deleted. Only then did the desync occur. Setting the pointer to NULL solved the problem.

Final thoughts

This was a very short overview of the synchronous engine used by Supreme Commander. I know that many old games were arranged in much the same way. The last generation is probably using some kind of new tricks, especially to combat input delays. I know that there is a lack of synchronization in StarCraft 2, so it most likely works similarly. Other games to watch are Heroes of Newarth or League of Legends. They are not as complex as SupCom, and they play quite smoothly, but I did not disassemble them and I do not know what methods they use for this.

Halo uses a synchronous model with a simultaneous step in Campaign Co-op and Firefight modes
In SupCom, input is treated as commands to unit groups. Commands of movement, attack, defense, use of abilities, etc.
Old retry files may be supported if it is possible to run old code with old data
Rewinding in Halo was done using “savepoints” that store the state of the game at a specific point. It was impossible to rewind smoothly, but it was possible to go to the previous save point and move forward from there. EMNIP.
SupCom makes full use of peer-to-peer network architecture
Lightning Force, unfortunately, is not attached

Tags: