Age of Empires network code: 1500 archers on a 28.8 kbit / s modem

Original author: Mark Terrano, Paul Bettner

Transfer

Translator's note: this article is already 17 years old, and it is interesting only from a historical point of view. It is curious to find out how the developers managed to achieve a smooth network game in the era of 28.8k-modems and the first Pentiums.

This article describes the architecture and implementation, as well as some lessons learned from creating the multiplayer (network) code for Age of Empires 1 and 2 games . It also outlines the current and future approaches to creating network architecture used by Ensemble Studios in their game engines.

Multiplayer Age of Empires: structure requirements

At the beginning of work on the multiplayer code Age of Empires in 1996, we set very specific goals for the implementation of the required gameplay.

Large-scale and epic historical battles with many different combat units
Support up to 8 players in multiplayer mode
Smooth gameplay simulation over LAN, via direct dial-up connection and over the Internet
Target platform support: Pentium 90 with 16 MB of RAM and a 28.8 kbit / s modem
The communication system should work with the existing engine (Genie)
Stable 15 frames per second on machines with minimal configuration

The Genie engine was already ready, and the gameplay in single-player mode began to take its forms. The Genie engine is a two-dimensional single-threaded game cycle engine. Sprites are rendered in 256 colors in the tile world. Randomly generated maps are filled with thousands of objects: from trees that can be cut down to jumping gazelles. Approximate breakdown (after optimization) of time for engine tasks: 30% for graphics rendering, 30% for AI and search for paths, 30% for execution of simulation and service tasks.

Already at a fairly early stage, the engine was relatively stable, and multi-user communications had to work with ready-made code without the need for a significant change to the existing (working) architecture.

Complicating the task was the fact that the time spent on performing each simulation step could vary greatly: the rendering time depended on whether the user was watching the units, scrolling or looking at the unexplored area, and the long paths or strategic planning of the AI significantly influenced the game execution time. : oscillations were up to 200 ms.

Brief calculations showed that transferring even a small set of data about units and attempts to update them in real time severely limit the number of units and objects with which a player can interact. If you simply transfer the X and Y coordinates, state, action, gaze direction and damage, then there can be no more than 250 mobile units in the game.

We wanted the players to destroy the Greek cities with catapults, archers and warriors, while at the same time leading the siege of triremes from the sea. Obviously, we needed some other approach.

Simultaneous Simulations

Instead of transferring the state of each unit of the game, we wanted to perform absolutely identical simulations on each machine, passing each the same set of commands given by the players at the same time. The players' computers, in essence, had to synchronize the gameplay in the best traditions of war films, allowing players to give commands, and then executing them in the same way and at the same time, ensuring the identical execution of the games.

Initially, such a tricky synchronization was difficult to implement, but as a result, it brought unexpected advantages in other areas.

Base Model Improvement

At the simplest conceptual level, the implementation of a simultaneous simulation seems very easy. In some games that use fixed-step (lock-step) simulations and constant timings, it may even be quite possible.

Since with this approach, it must take responsibility for simultaneously moving hundreds or thousands of objects, the system must remain viable even with fluctuations in delays from 20 to 1000 milliseconds and have time to process changes during frame processing.

Sending a player’s commands, confirming all messages and then processing them before proceeding to the next move would be a nightmare in terms of gameplay, with constant waiting and slow exchange of commands. We needed a circuit capable of continuing to process the game in parallel with the background waiting for the completion of the data exchange process.

Marc [Terrano] used a command tagging system that must be completed in two “data exchange processes” in the future (the data exchange moves in AoE were separated from the rendered frames themselves).

That is, commands issued in the course of 1000 are assigned to be executed during move 1002 (see Fig. 1). In the course of 1001, the commands issued during the course of 0999 are executed. This allowed us to receive, confirm, and prepare for processing the message, while the game continued to draw animations and perform simulations.

Figure 1. Marking of commands that must be performed through two “data exchange processes”.

Usually the moves took 200 ms, and the teams went during that turn. After 200 ms, the move stopped and a new move began. At each moment of the game, teams were processed in one move, received and saved for the next move, and then sent for execution two moves later.

"Speed Control"

Figure 2. Speed control.

Since simulations must always have exactly the same input data, the game can run no faster than the slowest machine has time to handle the exchange of data, render the move and send new commands. The system that changes the duration of the course to maintain the smoothness of animations and gameplay in the context of variable data exchange delay and processing speed, we called “Speed Control” (Speed Control).

The gameplay can be felt “slowing down” for two reasons: if the frame rate of one machine falls (or it is lower than the others), then the other machines process their commands, render everything in the allotted time and as a result they have to wait for the next move. In this case, any pause immediately becomes noticeable. In addition, the delay in data exchange slows the game - players have to wait until the machine receives enough data to complete the turn.

Each client calculated the frame rate that was always achievable, which was calculated by averaging the processing time of several frames. Since this value changes during the game depending on the field of view, the number of units, the size of the map and other factors, it was transmitted in each message about the completion of the turn.

In addition, each client also measured the “ping time” from himself to other clients and back. The average ping to the longest client, he also sent a message on the completion of the course (all for speed control used 2 bytes).

In each course, the machine assigned by the host analyzed the messages about the completion of the course, calculated the necessary frame rate and the correction for the delay in transmitting data over the Internet. The host then sent a new frame rate and duration of the communication. Figures 3-5 show how the exchange of data was broken in different conditions.

Figure 3. The normal course of data exchange.

Figure 4. High latency data transfer over the Internet at a normal machine speed.

Figure 5. Slow machine speed with normal data transfer delay.

The “communication flow”, which was approximately equal to the round trip ping time for the message, was divided by the number of simulation frames that, on average, the slowest machine could perform during that time.

The duration of the exchange of data was weighted, so it could quickly increase in accordance with changes in data transfer delays over the Internet and slowly decrease to the best average speed that can be maintained constantly. Usually, the game slowed down and slowed down only at the moments of the worst peaks - the delay in the transfer of commands increased, but remained smooth (and increased by only a few milliseconds per turn), because the game gradually reduced the delays to the best possible speed. This created the greatest possible smoothness of the gameplay, while at the same time providing adjustment to changing conditions.

Guaranteed delivery

UDP was used in the network layer, and each client was responsible for ordering, recognition, and retransmission. Each message used a couple of bytes, indicating the course for which the execution of commands is scheduled, and the sequence number of the message. If the message was received after the move, it was rejected, and incoming messages were saved for execution. Due to the very nature of UDP, Mark used the following principle when receiving messages: “If in doubt, the message should be considered lost. If messages are received out of order, the recipient immediately sends a request to retransmit the lost messages. If the confirmation of receipt is received after the predicted time, the sender simply sends the message again, without waiting for a signal about its loss. ”

Hidden benefits

Since the game-calculated results depended on all users performing identical simulations, the client (or the client data stream) was incredibly difficult to hack and cheat. Any simulation that was performed otherwise was marked as “out of sync” and the game stopped. It was still possible to cheat locally for information disclosure, but such leaks were relatively easily fixed in subsequent patches and revisions. Security has become our big win.

Hidden problems

At first it may seem that the identical execution of two instances of similar code is easy to implement, but this is not so. Microsoft Product Manager Tim Znamenachek, even at the very early stages of the project, told Mark: “Every project has one stubborn bug that does not surrender to the finish. I think in our case it will be out of sync. ” And he was right. The difficulty of finding out-of-sync errors has multiplied with every small change. A deer, whose position is slightly different when creating a random map, will move a little differently, and a minute later the hunter will slightly move out of the way or miss a spear, as a result of returning home without meat. Therefore, what sometimes seemed to be just the difference in the checksums of the amount of food had causes that were very difficult to track.

Although we checked the checksums of the world, objects, search for ways, aiming and all other systems, there was always something that we could not take into account. Huge (50 MB each) volumes of tracing messages and dumps of world objects made the problem even more complicated. Some of the difficulties were conceptual - programmers were not used to writing code that used the same number of random number generator calls in the simulation (and random numbers were also generated and synchronized).

Lessons learned

When developing the network part of Age of Empires, we learned a few lessons that can be applied to the development of any gaming multi-user system.

Study your user. Learning the user is the most important step towards understanding his expectations regarding the speed of the multiplayer, perceived brakes and delays in the transmission of commands. Each genre is different, and you need to understand what exactly suits your gameplay style and management.

In the early stages of the development process, Mark and the lead designer prototyped the data delays (this prototype was revised several times during the development process). Since they were playing a single player game, it was very easy to imitate different levels of team transfer delays and get player feedback (“management seems good / slow / jerking / just awful”).

For games in the RTS genre, command delays of 250 milliseconds are even imperceptible; at 250-500 ms, the gameplay is quite playable, and the brakes are noticeable at 500 ms and higher. It is also interesting to note that the players are accustomed to the “pace of the game” and the mental expectation of a delay between pressing the mouse and the reaction of the unit. Constant delayed response was better than jumps of command transmission delays (for example, from 80 to 500 ms) - in this case, constant delays of 500 ms were perceived to be playable, while changeable ones seemed to be "twitchy" and complicating the game.

This made it necessary to direct the efforts of programmers to ensure smoothness - it is better to choose a longer duration of the turn and to be sure that everything will be smooth and constant than to perform operations as quickly as possible, when faced with regular decelerations. All speed changes should be gradual, and increment values should be as small as possible.

We also measured the requirements of users to the system - usually they gave commands (move, attack, chop trees) approximately every one and a half to two seconds, sometimes with peaks of 3-4 commands per second during fierce battles. As active actions in our game constantly increased, the highest requirements for data exchange arise in the middle and closer to the end of the game.

If you take the time to study the behavior of users, you will notice other features of how they play, and this will help in setting up a network game. In AoE, during an attack, users quickly clicked the mouse (click-click-click-click - forward-forward-forward!), Which led to huge peaks in the number of commands given. In addition, they sent large groups of units that need to pave the way - also huge peaks in the requirements for transferring data over the network. A simple filter that cuts off repetitive commands at one point significantly reduces the negative impact of this behavior.

In general, user monitoring will allow you to:

Learn user expectations about game delays
Prototype multiplayer aspects early in development
See the behavior that is detrimental to the speed of the multiplayer mode.

Measurement is the most important. If you enter metrics in the early stages of work, you will learn amazing things about your data exchange system. Make the metrics readable for testers and use them to understand what's going on inside the network engine.

Lesson: part of the problem with data exchange in AoE arose when Mark was displaying metrics too early and did not check the message levels (length and frequency) again after preparing the final code. Such unexpected things as random races between AIs, difficult paths to compute, and poorly structured command packets can cause huge performance problems, even when the system works well for the rest.

Make the system notify testers and developers of what seems to be exceeding the boundary conditions - programmers and testers in the development process will see which tasks load the system; This will solve problems in the early stages of their occurrence.

Spend time explaining to the testers the work of the data exchange system, show and explain metrics to them - you may be surprised at what they notice when strange failures will inevitably arise in the network code.

In general, metrics should have the following properties:

Being human-readable and understandable by testers
Point to bottlenecks, brakes and problems
It is not enough to influence the performance and be constantly running.

Developer Training It is very difficult to teach programmers who are accustomed to creating single-user applications, so that they think about the separation between giving, receiving and processing a command. It is easy to forget that you can request something that did not happen, or that can happen a few seconds after the command is returned. Teams need to be checked for accuracy and when sending and receiving.

With a synchronous model, programmers must also take into account that inside the simulation the code should not depend on any local factor (such as free time, special equipment, or different settings). Code execution on all machines must be the same. For example, the presence of random sounds of a relief inside a simulation can lead to different behavior of games.

Other lessons.This should be common sense - but if you are dependent on a third-party network (DirectPlay in our case), write an independent test application confirming that when the owners declare “guaranteed delivery”, the messages really get that “guaranteed package order” in fact there is, and that the product has no hidden bottlenecks or strange behavior when processing the transmitted data in your game.

Get ready to create simulation applications and stress test simulators. In the end, we created three different minimum test applications used to investigate individual and important problems: connection flooding, problems with simultaneous connections when selecting rivals, and lost guaranteed packages.

Test with modems (and, if lucky, with modem simulators) as early as possible; Continue modem testing (no matter how painful this may be) throughout the development process. After all, problems are difficult to isolate (what is the reason for a sharp decrease in speed - provider, game, communication software, modem, match search service or something else?), And users do not want to mess around with slow dialup connections, getting used to instantaneous LAN speeds . It is vital to perform testing on modem connections with the same perseverance as with multiplayer games on the LAN.

Improvements for Age of Empires 2

In Age of Empires 2: The Age of Kings, we have added multiplayer features such as recording games, transferring files and constantly tracking statistics on The Zone website. We also improved multiplayer systems such as integration with DirectPlay and speed control to cope with the bugs and speed issues identified after the release of Age of Empires .

The function of recording games was one of those things that were originally thought out for debugging, and as a result they became a full-scale game chip. Recorded games are incredibly popular on fan sites. They allow players to share strategies and analyze them, view famous battles and learn the games in which they participated. Recording games has become an invaluable debugging tool. Since our simulation is deterministic and the recorded games are synchronous in the same sense as multiplayer, the recording of games provided us with a great way to play bugs, because every time it was guaranteed to play the same way.

Our integration with The Zone's matchmaker search (matchmaking) service was limited in Age of Empires by simply launching the game. In Age of Kingswe expanded it, and this allowed us to manage the launch parameters and provide constant reporting on statistics. This allowed the players to better find the games that they were interested in, because they could see the parameters of matchmaking, and not wait for the transition to the game settings screen. In the backend, we implemented constant reporting and tracking statistics. We provided The Zone with a general structure that was filled in and transmitted to the server at the end of the game. The data from this structure was used to create user ratings and show them on The Zone website.

Multiplayer RTS3: tasks

RTS3 is the codename of the strategic game Ensemble of a new generation (note: the game was released under the name Age of Mythology) . The structure of RTS3 is created on the basis of the successful formula used in the Age of Empires series of games, with the addition of many new features and requirements for a multiplayer mode.

Based on the Age of Empires 1 and 2 feature set . Requirements such as playing on the Internet, large and varied maps, thousands of guided units.
3D: RTS3 is a completely three-dimensional game with interpolated animation and non-discrete positions and turns of units.
More players - support for more than eight players.
TCP / IP support: our main goal is a 56 kbps TCP / IP Internet connection.
Home Networking Support — Supporting end-user network configurations, including firewalls and NAT.

Even in the early stages of developing RTS3, we decided to adhere to the same internal network model as in Age of Empires 1 and 2 - synchronous simulation - because the RTS3 structure can take advantage of this architecture in many ways. In AOE / AOK, we used DirectPlay for session transfer and control services, but for RTS3 we decided to create a basic network library, using only basic socket procedures as a basis.

The transition to a fully three-dimensional world means that we must be more attentive to problems with frame rates and the overall smoothness of simulation in multiplayer mode. However, this also means that the update time of the simulation situation and the frame rate will be even more susceptible to variability, and that we will have to spend more time on rendering. In the Genie engine, the turns of the units were discrete, and the animations were tied to the frame rate - in BANG! possible arbitrary rotation of units and smooth animation, that is, visually the game will be much more sensitive to the effect of delays and jumps in the refresh rate.

Completing the development of Age of KingsWe wanted to tackle these critical areas in which thoughtful design and working with tools will greatly reduce debugging time. We also realized how important the iterative process of playasting in the design of our games is, so a high priority was given to the earliest possible conclusion of the game online.

RTS3 communication architecture

Figure 6. RTS3 strict object-oriented network architecture.

Object oriented approach. The network architecture of RTS3 has a strict object-orientation (see Figure 6). Requirements to support various network configurations allow you to take advantage of the OO approach, to abstract from the specifics of the platform, protocol and topology that underlie a set of generalized objects and systems.

Protocol-specific and topology-specific versions of network objects contain as little code as possible. The main functionality of these objects is abundant in high-level parent objects. To implement the new protocol, we extended only those network objects that needed protocol-specific code (for example, for the client and the session, which, depending on the protocol, should act a little differently). No other system objects (such as Channels, TimeSync, etc.) required changes, because they had an interface with the client and the session only through their high-level abstract interfaces.

Peer Topology.The Genie engine supported the peer-to-peer network topology, in which all clients in a session are connected to each other in a star configuration. In RTS3, we continued to use this topology, because when implemented with a synchronous simulation model it has inherent advantages.

Peer-to-peer topology implies using a star configuration for clients connected to a session (Figure 7). That is, each client is connected to all other clients. The same scheme was used in Age 1 and 2 .

Figure 7. Star configuration of peer-to-peer clients in a session.

Peer-to-peer benefits:

Reduced latency due to the client-to-client message transfer scheme instead of client-server-client.
There is no central weak link - if the client (even the host) disconnects from the session, the game can continue.

Disadvantages of Peer-to-peer:

More active connections in the system (sum from n = 0 to k-1 (n)), that is, more potential weak links and higher probable delays.
The inability to support in this scheme some configurations of NAT.

Net.lib. When developing the RTS3 data exchange architecture, our goal was to create a system specifically designed for strategic games, but at the same time we wanted to create a system that can be used for our internal tools, as well as to expand it to support future games. To achieve this goal, we created a multi-layered architecture that supports game-level objects such as client and session, but also supports low-level transport objects such as links and network addresses.

Figure 8. The four service layers of our network model.

RTS3 is based on our BANG engine! A new generation that uses a modular architecture with component libraries such as sound, rendering, and the network. The network subsystem is built in here as a component, but connected to the BANG engine! (as well as with various intra-instrument tools). Our network model is divided into four layers of services that are almost, but not completely, different from the OSI network model used in the game (see Figure 8).

Socks, Level 1

The first level, Socks, provides a fundamental socket-level API in the C language. It is abstracted to create a generic set of low-level network procedures for a variety of operating systems. The interface resembles the Berkeley sockets interface. The Socks layer is mainly used by higher levels of the network library and is not really intended to be used by application code.

Link Level 2

Level 2, Link, provides transport layer services. Objects at this level, such as Link, Listener, NetworkAddress and Packet, are useful elements needed to establish a connection and send messages on it (see Figure 9).

Packet (packet): this is our fundamental message structure — an extensible object that automatically manages its serialization / de-serialization (exclusively using virtual methods) when passing over the reference object.
Link: A link between two end points of a network. It can also be a reference to itself, in which case both endpoints are on the same machine. The send and receive methods of the link know how to work with packets, as well as with void * data buffers.
Listener: A link generator. This object listens to incoming connections and creates a link after the connection is established.
Data stream: this is an arbitrary measured data stream through a given link used, for example, to transfer files.
Net Address: A protocol-independent network addressing object.
Ping: a simple ping class. Reports a network delay present when linking to a link.
Figure 9. Link level.

Multiplayer, Level 3
The multiplayer level is the highest level of objects and procedures present in the net.lib API. This is the layer that RTS3 interacts with when collecting lower-level objects such as links and converting them into more useful concepts / objects - clients, sessions, and so on.

The most interesting objects in the network library BANG! are those that are in the multiplayer level. Here, the API provides a set of objects with which the game level can interact, but it provides an implementation of a game-independent approach.

Client (client): the most basic abstraction of a network endpoint. It can be configured as a remote client (link) or as a local client (link to itself). Clients are not created directly, but are generated by a session object.
Session: This is the object responsible for creating, making connections, collecting and managing clients. Session contains all other objects of the multiplayer level. To use this object, the application simply calls host () or join (), passing them either a local or remote address, and the session is engaged in everything else. Her responsibilities include automatically creating / deleting clients, sending notifications about session events and managing traffic to relevant objects.
Channel and Ordered Channel: this object is a virtual messaging channel. Messages transmitted through the channel are automatically separated and received by the corresponding channel object in remote clients. The ordered channel works with the TimeSync object to ensure that the messages received on this channel are the same for all clients.
Shared Data: is a collection of data shared by all customers. You can extend this object to create specific instances containing your own data types, and then use built-in methods to ensure that these data elements are automatically and synchronously updated over the network.
Time Sync: controls the smooth change of synchronized network time for all clients in a session.

Game Communications, Level 4

The communications level refers to the part of RTS3. This is the main collection of systems through which the game interacts with the network library, living inside the game’s code itself. The communications layer provides many useful helper functions for creating and managing multiplayer-level network objects, aiming to reduce the needs of the game multiplayer to a small, easy-to-use interface.

New features and improved tools

Improved synchronization system. No one from the Age of Empires development teamcould say that we don’t need better synchronization tools. As in any project, when analyzing the development process in a post-mortem, it turns out that most of the time was spent on most areas, but it could be much less if we had dealt with them in advance. At the beginning of the development of RTS3 in the top lines of the list of such areas was debugging synchronization.

Synchronization Tracking System RTS3 is mainly aimed at quickly recognizing synchronization bugs. Other priorities were simplification of use, the ability to process arbitrarily large amounts of synchronized data passed through the system, the ability to fully compile the synchronization code in the release assembly, and finally, the ability to completely change the test configuration by changing variables instead of completely recompiling.

The synchronization check in RTS3 is performed using two sets of macros: Both of these macros get the userinfo string parameter, which is the name or an indication of a specific synchronized item. For example, a synchronization call might look like this: Synchronous console commands and configuration variables.

#define syncRandCode(userinfo) 

gSync->addCodeSync(cRandSync, userinfo, __FILE__, __LINE__)

#define syncRandData(userinfo, 

v) gSync->addDataSync(cRandSync, v, userinfo, __FILE__, __LINE__)

syncRandCode("syncing the random seed", seed);

As any Quake mod developer can confirm , console commands and configuration variables are very important to the development process. Console commands are simple function calls that are performed using a launch configuration file, an in-game console, or a UI that call for arbitrary game functionality. Configuration variables are named data types provided through simple get, set, define, and toggle functions that we use for all sorts of testing and tuning configuration parameters.

Paul has created multiplayer-compatible versions of our systems of console commands and variable configurations. With their help, we can conveniently turn an ordinary configuration variable (for example, enableCheating) into a multiplayer configuration variable by adding a flag to the definition of a configuration variable. If this flag is enabled, the configuration variable is passed inside the multiplayer game and synchronized in-game decisions (for example, on the admissibility of free transfer of resources) can be based on its value. Multiplayer console commands have a similar principle - calls to multiplayer console commands are transmitted over the network and are executed synchronously on all client machines.

Through the use of these two tools, developers can use the multiplayer system without writing code. They can quickly add new testing tools and configurations and easily enable them in a networked environment.

Summing up

The synchronous simulation and peer to peer model were successfully used in the Age of Empires series of games. Despite the critical importance of investing time in creating tools and technologies to solve the main problems of this approach (such as synchronization and network metrics), the viability of this architecture in the real-time strategy genre has been proven by experience. Subsequent improvements made by us in RTS3, have led to the fact that multiplayer gameplay is almost indistinguishable from single-user, even in the most terrible conditions of network connections.

Tags: